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Abstract 

Building  robust  recognition  systems  requires  a  careful  understanding  of  the  effects  of  error  in  sensed 
features.  In  model-based  recognition,  matches  between  model  features  and  sensed  image  features  typically 
are  used  to  compute  a  model  pose  and  then  project  the  unmatched  model  features  into  the  image.  The 
error  in  the  image  features  results  in  uncertainty  in  the  projected  model  features.  We  first  show  how 
error  propagates  when  poses  are  based  on  three  pairs  of  model  and  image  points.  In  particular,  we 
show  how  to  simply  and  efficiently  compute  the  region  in  the  image  where  an  unmatched  model  point 
might  appear,  for  both  Gaussian  and  bounded  error  in  the  detection  of  image  points,  and  for  both 
scaled-orthographic  and  perspective  projection  models.  This  result  applies  to  objects  that  are  fully  three- 
dimensional,  where  past  results  considered  only  two-dimensional  objects.  The  result  is  based  on  an 
approximation  that  accurately  linearizes  the  relationship  between  matched  image  points  and  unmatched, 
projected  model  points.  Secondly,  based  on  the  linear  approximation,  we  show  how  we  can  utilize  linear 
programming  to  compute  the  propagated  error  region  for  any  number  of  initial  matches.  Finally,  we  use 
these  results  to  extend,  from  two-dimensional  to  three-dimensional  objects,  robust  implementations  of 
alignment,  interpretation-tree  search,  and  transformation  clustering. 
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1  Introduction 

Given  a  correspondence  between  a  set  of  image  features 
and  model  features,  a  general  problem  in  recognition  is 
to  evaluate  the  correspondence  and  improve  it  if  nec¬ 
essary.  For  instance,  for  object  recognition  the  model 
may  be  a  sparse  set  of  3D  points  and  line  segments.  For 
aerial  images,  the  model  may  be  a  terrain  elevation  map 
that  includes  the  world  locations  of  a  small  set  of  land¬ 
marks,  In  some  applications,  a  user  may  supply  the  ini¬ 
tial  correspondence,  leaving  the  computer  to  estimate 
and  refine  the  model  pose  (position  and  orientation).  In 
other  cases,  the  computer  must  find  the  initial  corre¬ 
spondence  as  well;  this  may  be  done  through  a  combina¬ 
tion  of  grouping,  indexing,  and  raw  search.  Important 
computations  involved  in  evaluating  and  improving  the 
correspondence  include  (1)  deciding  whether  the  corre¬ 
spondence  provides  an  accurate  alignment,  (2)  determin¬ 
ing  which  image  features  could  correspond  to  each  un¬ 
matched  model  feature,  and  (3)  choosing  a  new  match 
to  extend  the  correspondence.  These  computations  are 
intertwined  with  the  issue  of  error  propagation,  that  is, 
the  issue  of  how  error  in  a  set  of  matched  image  fea¬ 
tures  propagates  to  uncertainty  in  the  predicted  image 
locations  of  the  remaining  model  features.  We  call  these 
predicted  image  locations  the  uncertainty  regions  of  the 
model  features,  and  we  derive  either  bounds  on  these  re¬ 
gions  or  probability  distributions  on  them,  depending  on 
our  model  of  error. 

There  are  several  reasons  why  it  is  useful  to  care¬ 
fully  understand  the  propagation  of  uncertainty,  as  op¬ 
posed  to  assuming  some  small,  simple  uncertainty  re¬ 
gion  and  using  it  in  all  cases.  First,  as  we  will  show, 
uncertainty  regions  can  vary  quite  a  bit  in  size,  and 
may  be  quite  large  for  the  predicted  model  features,  re¬ 
sulting  in  many  candidate  image  features  for  each  pre¬ 
diction.  In  particular,  grouping  techniques  commonly 
find  image  features  that  are  close  together  on  an  object 
(e.g.,  [11,  8,  25,  31,  27,  29]),  and  we  will  see  that  this 
easily  can  lead  to  large  uncertainty  regions.  Even  when 
the  matched  features  are  far  apart  in  the  image,  the  un¬ 
certainty  regions  of  the  unmatched  points  may  still  be 
large,  due  to  the  depth  of  the  3D  model.  Second,  both 
when  the  image  features  are  nearby  and  when  they  are 
far  apart,  there  are  situations  in  which  the  pose  of  the 
model  is  unstable,  and  the  uncertainty  regions  assume 
surprising  shapes.  By  understanding  the  propagation  of 
uncertainty,  then,  we  can  determine  exactly  where  to 
look  for  features,  and  we  can  evaluate  the  stability  of 
the  pose  produced  by  the  initial  correspondence. 

1.1  Summary  of  Results 

Given  a  set  of  matched  image  and  model  points,  we  de¬ 
termine  an  unmatched  model  point’s  uncertainty  region. 
We  consider  this  problem  for  the  c€ise  in  which  corre¬ 
spondences  are  based  on  point  features.  We  handle  both 
scaled-orthographic  and  perspective  projection  models. 
We  also  consider  two  different  models  of  error.  First, 
we  consider  image  points  detected  with  errors  that  have 
known,  independent  Gaussian  distributions.  Second,  we 
consider  a  bounded  error  model,  in  which  we  suppose 
that  the  error  distributions  are  unknown.  In  this  case  , 


we  make  only  the  weak  assumption  that  the  magnitude 
of  the  error  vectors  can  be  bounded  by  some  maximum 
number  of  pixels  6.  Given  no  other  information,  Gaus- 
sians  may  be  the  preferred  error  distribution,  since  image 
features  are  displaced  by  a  sum  of  error  vectors,  incurred 
over  a  series  of  processes  such  as  digitization,  smoothing, 
and  edge  detection.  A  bounded  error  model  may  be  use¬ 
ful,  however,  when  errors  contain  a  consistent  bias  that 
results  in  distributions  that  are  significantly  skewed  from 
Gaussian.  In  the  first  case,  we  show  how  Gaussian  error 
in  matched  image  points  propagates  to  an  uncertainty 
region  with  a  Gaussian  distribution  for  an  unmatched 
point.  In  the  second  case,  we  show  how  bounded  error 
in  image  points  propagates  to  a  bounded  uncertainty 
region  describing  the  possible  location  of  an  additional 
model  point. 

First  we  compute  the  uncertainty  regions  for  sets  of 
three  matched  points.  We  derive  a  simple  linear  ex¬ 
pression  that  approximates  the  relationship  between  the 
matched  and  unmatched  points.  This  relationship  allows 
us  to  show  that,  for  bounded  error,  the  uncertainty  re¬ 
gion  for  a  fourth  point  is  circular,  and  to  derive  analytic 
expressions  for  the  center  and  radius  of  the  circle.  For 
Gaussian  error,  this  relationship  implies  that  the  prop¬ 
agated  distribution  of  uncertainty  is  also  Gaussian,  and 
provides  analytic  expressions  for  the  center  and  standard 
deviation.  We  perform  experiments  to  verify  that  these 
expressions  are  accurate  for  the  amount  of  error  that  is 
of  interest  in  most  recognition  applications. 

We  also  take  advantage  of  the  linear  relationship  by 
introducing  a  new  algorithm  that  allows  us  to  determine 
the  uncertainty  region  for  any  number  of  matched  points. 
To  do  this  we  approximate  our  bounded  error  regions 
with  convex  polygons,  and  then  show  that  we  can  use 
linear  programming  to  derive  a  convex  polygon  that  de¬ 
scribes  the  uncertainty  region  of  the  unmatched  model 
point.  We  experiment  with  both  synthetic  images  and 
a  real  image  to  observe  the  accuracy  of  the  uncertainty 
regions  that  we  compute,  and  to  determine  the  extent  to 
which  they  shrink  as  we  match  more  points. 

Finally,  we  show  how  to  extend  previous  work  for  lin¬ 
ear  projection  models  to  the  cases  of  scaled-orthographic 
and  perspective  projections.  Using  the  linear  approxi¬ 
mation  we  show  that  we  can  use  Baird’s  [6]  algorithm  to 
tell  whether  a  set  of  matches  between  image  and  model 
points  are  geometrically  consistent,  and  that  we  can  ap¬ 
ply  Cass’  [12]  and  Breuel’s  [10]  algorithms  to  find,  in 
polynomial  time,  the  model  pose  that  aligns  the  max¬ 
imum  number  of  model  and  image  features  to  within 
error  bounds.  We  also  extend  Jacobs’  [28]  and  Sarachik 
and  Crimson’s  [39]  planar  alignment  algorithms  to  3D 
objects. 

1.2  Projection  Models 

For  reference,  we  review  the  models  of  projection  that 
we  refer  to  in  this  paper.  For  perspective  projection,  we 
can  write  the  corresponding  image  position  (x,  y)  of  a 
3D  model  point  (x,y,  z)  in  terms  of  a  3D,  rigid  rotation 
matrix  R,  a  3D  translation  vector  u,  and  a  camera  focal 


length  /.  Letting  Vi^  be  the  elements  of  R,  we  have 

_  riiz-f  ri2y +  ri3^-hUr 
”  ^  vzix  +  r32y  +  r33:2  -h  Uz  ’ 

j.7’2iI  4-  V22y  +  ^23^  + 
y  ~  f — r - z - z - ? 

7*31®  +  r32y  +  rssz  + 


the  transformation  parameters  in  terms  of  the  image 
point  coordinates.  Since  multiplying  a  matrix  by  a  vec¬ 
tor  is  a  linear  operation,  applying  the  computed  trans¬ 
formation  to  any  unmatched  model  point  gives  a  linear 
expression  for  the  model  point’s  image  position  in  terms 
of  the  matched  image  points. 


where  the  rows  of  R  are  orthonormal,  and  where  we  as¬ 
sume  the  origin  is  at  the  center  of  projection.  When  the 
focal  length  /  is  known,  there  are  six  degrees  of  freedom, 
and  consequently  three  corresponding  model  and  image 
points  are  “minimal”  to  determine  the  transformation. 
Given  three  corresponding  points,  there  exist  up  to  four 
solutions  for  the  model  pose  [17]. 

This  paper  extensively  considers  scaled-orthographic 
(also  known  as  weak-perspective)  projection,  in  which  a 
3D  object  is  scaled  down  and  projected  orthographically 
into  the  image.  This  projection  model  is  appropriate 
when  the  camera  is  far  from  the  objects  being  viewed 
with  respect  to  their  sizes.  In  this  case,  the  image  posi¬ 
tion  of  (x,  y,  z)  can  be  written  in  terms  of  the  first  two 
rows  of  a  scaled,  3D  rotation  matrix,  S  =  sR,  and  of 
a  scaled,  3D  translation  vector,  6.  Letting  Sij  be  the 
elements  of  S,  we  have 

X  =  sii®  4- si2y  +  S13Z -h  (3) 

y  =  S21X  +  322y  +  S23Z  +  by,  (4) 

where  ||  (511,512,513)  ||=||  (521,522,523)  1|  and 

(511,512,513)  •  (521,522,523)  =  0.  There  are  six  degrees 
of  freedom  in  the  scaled-orthographic  model-to-image 
transformation,  and  consequently  three  corresponding 
points  are  minimal  to  determine  the  transformation. 
Given  three  corresponding  points,  the  transformation  al¬ 
ways  exists  if  the  model  points  are  not  collinear  and  it 
generally  has  two  solutions  [27,  2];  in  particular,  the  scale 
factor  and  translation  are  always  unique,  and  the  rigid 
rotation  matrix  is  unique  up  to  a  reflection  of  the  rotated 
model  about  a  plane  parallel  to  the  image. 

For  3D  linear  projection,  we  remove  the  two  non¬ 
linear  constraints  on  the  rotation  parameters  in  the 
scaled-orthographic  projection  model.  This  transfor¬ 
mation  is  equivalent  to  applying  a  scaled-orthographic 
transformation  to  the  model,  and  then  applying  a  scaled- 
orthographic  transformation  to  the  resulting  image;  in 
total,  this  is  like  taking  a  picture  of  a  photograph  [29]. 
There  are  eight  degrees  of  freedom  in  linear  projection, 
and  four  corresponding  points  are  minimal  to  determine 
the  transformation.  Given  a  minimal  set  of  matches,  this 
is  the  only  transformation  of  the  three  in  which  the  un¬ 
matched  model  points  can  be  written  linearly  in  terms 
of  the  matched  image  points.  In  particular,  let  the  four 
image  and  model  points  be  (aJt,yt)  and  (®»,yi,^i),  re¬ 
spectively,  for  z  =  1,  2,  3,  4.  Then  we  can  obtain  the  first 
row  of  the  transformation  by  solving 


X2  y2 
X3  ya 
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£1  1  1  r  5ii  “ 

Z2  1  512 
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(5) 


A  similar  equation  holds  for  the  second  row  of  the  trans¬ 
formation.  These  equations  give  linear  expressions  for 


1.3  Background 

Due  to  the  value  of  top-down  knowledge  in  model-based 
vision,  it  is  common  to  generate  hypotheses  about  an 
object’s  pose  based  on  a  small  amount  of  information, 
and  then  to  look  for  evidence  to  confirm  or  reject  the 
hypotheses.  In  the  alignment  approach,  a  small  number 
of  image  features  are  matched  to  model  features  to  de¬ 
termine  the  object’s  pose.  This  pose  is  used  to  project 
additional  model  features  into  the  image,  which  are 
matched  to  nearby  image  features  for  verification  (e.g., 

Roberts  [37],  Clark  et  al.  [13],  Fischler  and  Bolles  [17], 

Lowe  [34],  Ayache  and  Faugeras  [5],  Huttenlocher  and 
Ullman  [27]).  In  interpretation-tree  search,  additional 
matches  between  model  and  image  features  are  then 
used  to  look  for  more  matches,  backtracking  if  enough 
valid  matches  cannot  be  found  (e.g.,  Bolles  and  Cain  [8], 

Goad  [18],  Grimsonand  Lozano-Perez  [23],  Horaud  [25]). 

To  obtain  the  object’s  pose,  some  approaches  use  mini¬ 
mal  sets  of  matches  between  model  and  image  features 
(e.g.,  Clark  et  al.  [13],  Fischler  and  Bolles  [17],  Ayache 
and  Faugeras  [5],  Horaud  [25],  Huttenlocher  and  Ull¬ 
man  [27]).  Other  approaches  use  indexing  to  match  more 
than  the  minimal  number  before  looking  for  confirm¬ 
ing  features  (e.g.,  Rothwell  et  al.  [38],  Thompson  and 
Mundy  [43],  Lamdan  et  al.  [32],  Jacobs  [29]). 

Most  recognition  systems  take  an  ad-hoc  approach  to 
the  problem  of  accounting  for  the  effects  of  sensing  er¬ 
ror  on  the  projected  positions  of  unmatched  model  fea¬ 
tures.  Some  systems  match  projected  model  features  to 
image  features  if  they  are  separated  by  a  distance  that 
is  less  than  some  threshold  (e.g.,  Clark  et  al.  [13],  Fis¬ 
chler  and  Bolles  [17],  Brooks  [11],  Bolles  and  Cain  [8], 
Huttenlocher  and  Ullman  [27]).  Other  systems  rank  the 
unmatched  image  features  using  heuristics  involving  dis¬ 
tance  and  orientation,  and  then  pick  the  feature  with 
highest  rank  (e.g.,  Ayache  and  Faugeras  [5],  Lowe  [34]). 

Many  questions  remain  concerning  the  performance  of 
these  systems.  For  example,  although  we  know  the  min¬ 
imal  number  of  features  needed  to  generate  a  model  pose, 
we  do  not  know  how  accurate  the  pose  must  be  to  al¬ 
low  us  to  identify  the  object.  In  addition,  some  authors 
stress  the  importance  of  using  a  minimal  set  of  features 
[17,  27],  while  others  contend  that  this  will  not  produce 
a  sufficiently  accurate  pose  for  recognition  [34].  It  is  — — — 
in  general  difficult  to  characterize  the  conditions  under  ^ 
which  these  systems  will  succeed  or  fail,  or  to  evaluate 
the  relative  effectiveness  of  the  different  strategies  for  q 

recognition,  or  to  understand  the  extent  to  which  each  q 

approach  makes  the  best  possible  use  of  the  information  ^ 
available.  A  careful  understanding  of  the  effects  of  sens- 
ing  error  is  a  prerequisite  to  doing  all  of  these. 

1.3.1  Two-dimensional  objects 

Recently,  there  has  been  considerable  effort  aimed  at 
better  understanding  the  effects  of  error  on  the  match- 

j  _  ^d/TSu^ 

^  I  j 


ing  process.  Some  of  this  work  attempts  to  design  algo¬ 
rithms  that  are  guaranteed  to  perform  well  in  the  pres¬ 
ence  of  error  (e.g.,  Baird  [6],  Cass  [12],  Breuel  [10]),  but 
most  relevant  to  this  paper  is  work  that  also  examines 
the  propagation  of  error  in  recognition  systems. 

Huttenlocher  [26]  examined  the  effects  of  bounded 
error  on  the  alignment  approach  to  recognition.  This 
analysis  considered  planar  objects  viewed  from  arbitrary 
3D  positions,  assuming  scaled-orthographic  projection. 
Pose  was  determined  by  matching  three  model  and  im¬ 
age  points.  For  some  situations,  Huttenlocher  placed 
approximate  bounds  on  the  uncertainty  regions. 

Subsequently,  Jacobs  [28]  showed  that  the  true  uncer¬ 
tainty  regions  are  discs,  and  gave  analytic  expressions 
for  their  centers  and  radii.  These  regions  are  circular 
because  in  this  case  the  projection  model  is  linear  in 
such  a  way  that  error  in  any  of  the  three  matched  image 
points  causes  error  in  a  projected  model  point  that  is 
identical  but  scaled  by  a  constant  factor.  This  constant 
factor  depends  on  the  model  structure,  but  not  on  the 
viewpoint.  Consequently,  the  sizes  of  the  uncertainty 
regions  are  independent  of  how  far  apart  in  the  image 
are  the  three  matched  points,  which  means  the  uncer¬ 
tainty  is  independent  of  the  pose  of  the  model.  Jacobs’ 
result  was  used  by  Crimson  et  al.  [22]  to  analyze  the 
false-positive  sensitivity  of  planar  alignment. 

A  number  of  researchers  have  also  considered  the  ef¬ 
fect  of  Gaussian  error  on  alignment  methods.  As  men¬ 
tioned  above,  for  planar  objects,  each  predicted  model 
point  can  be  written  as  a  linear  combination  of  the 
matched  image  points.  Therefore,  Gaussian  error  in  the 
image  points  leads  to  Gaussian  uncertainty  in  every  pre¬ 
dicted  point  (e.g.,  [42]).  Sarachik  and  Crimson  [39]  used 
this  observation  to  propose  a  new  method  of  perform¬ 
ing  and  evaluating  alignment  approaches  to  recognition. 
Beveridge  et  al.  [7]  use  a  robust  method  to  evaluate  par¬ 
ticular  model  poses. 

Error  propagation  has  also  been  studied  in  the  context 
of  Geometric  Hashing  approaches  to  recognition.  Costa 
et  al,  [15]  considered  the  distribution  of  uncertainty  re¬ 
gions  in  terms  of  the  affine  invariant  parameters  that  de¬ 
scribe  the  image  points.  Rigoutsos  and  Hummel  [35,  36] 
also  considered  this  issue  for  Gaussian  and  uniform  er¬ 
ror.  Both  Costa  et  al.  and  Rigoutsos  and  Hummel  then 
considered  the  implications  of  these  results  for  recog¬ 
nition  schemes.  Lamdan  and  Wolfson  [33]  considered 
the  related  problem  of  determining  when  three  image 
points  provide  an  unstable  basis  for  Geometric  Hashing. 
Crimson  and  Huttenlocher  [20]  considered  the  effects  of 
bounded  error  on  Geometric  Hashing,  and  provided  loose 
bounds  on  this  effect.  Jacobs  [28]  determined  exactly 
how  bounded  error  effects  Geometric  Hashing  indices. 
Crimson  et  al.  [22]  then  further  developed  this  result  and 
used  it  to  analyze  the  performance  of  Geometric  Hash¬ 
ing  algorithms.  Sarachik  and  Crimson’s  [39]  results  also 
apply  to  Geometric  Hashing. 

1.3.2  Three-dimensional  objects 

Error  propagation  is  more  complex  in  recognition 
systems  that  deal  with  fully  three-dimensional  objects. 
Bolles  et  al.  [9]  studied  how  error  propagates  from  the 


parameters  of  a  modei-to-image  transformation  to  the 
predicted  model  points.  Bolles  et  ai.  assumed  that  the 
errors  in  the  parameters  were  independent  and  normally- 
distributed  and  that  estimates  of  the  distributions  would 
be  available.  Unlike  other  previous  work,  Bolles  et  al. 
dealt  with  perspective  projection,  which  made  the  rela¬ 
tionship  between  the  error  vectors  in  the  transformation 
parameters  and  the  predicted  points  non-linear.  In  fact, 
their  analysis  is  the  most  similar  to  our  own,  because 
they  took  a  (first-order)  approximation  that  linearizes 
the  error-vector  relationship.  As  a  result  they  obtained 
Gaussian  uncertainty  distributions.  The  main  difference 
with  our  work,  in  addition  to  our  treatment  of  bounded 
error,  is  that  we  will  let  the  error  be  in  the  matched 
image  points,  instead  of  assuming  we  know  the  distribu¬ 
tions  for  all  of  the  transformation  parameters.  Further¬ 
more,  we  will  derive  direct  expressions  for  the  predicted 
points  in  terms  of  the  matched  points,  so  that  we  do  not 
explicitly  go  through  a  rigid  transformation. 

Recently,  Crimson  et  al.  [21]  presented  a  formal  analy¬ 
sis  of  error  propagation  starting  from  the  matched  image 
points,  for  three-dimensional  objects.  They  considered 
scaled-orthographic  projection  and  bounded,  circular  er¬ 
ror.  Starting  from  three  matched  points,  they  provided 
a  numerical  method  of  bounding  the  uncertainty  in  the 
transformation  parameters.  Then  they  used  the  bounds 
on  the  parameters  to  obtain  complicated,  loose  bounds 
on  the  uncertainty  regions  of  the  predicted  points.  Via 
these  bounds,  they  analyzed  the  false-positive  sensitivity 
of  3D-from-2D  alignment  and  transformation  clustering, 
in  the  domain  of  point  features.  The  numerical  tech¬ 
nique  is  less  practical,  however,  for  use  at  run-time  in  a 
recognition  system. 

Using  the  same  projection  and  error  models  cis  Crim¬ 
son  et  al.  [21],  Alter  and  Crimson  [4]  presented  experi¬ 
ments  that  show  that  the  true  uncertainty  regions  tend 
to  be  circular  to  a  good  approximation,  and  presented  a 
numeric  method  for  more  accurately  bounding  the  uncer¬ 
tainty  regions.  This  technique  was  used  to  study  again 
the  false-positive  sensitivity  of  3D-from-2D  alignment, 
except  also  using  line  features  for  verification.  Alter 
and  Crimson  demonstrated  that  using  points  for  gen¬ 
erating  hypotheses  and  lines  for  verification  could  lead 
to  robust  recognition.  As  before,  the  numerical  error- 
propagation  technique  is  less  practical  for  a  real-time 
system.  Furthermore,  the  two  weak-perspective  solu¬ 
tions  lead  to  two  distinct  uncertainty  regions,  which  is 
not  true  when  the  model  is  planar.  Alter  and  Crimson’s 
technique  sometimes  performed  poorly  when  the  two  re¬ 
gions  overlapped,  because  it  had  difficulty  distinguishing 
them. 

Also  for  3D  objects,  Weinshall  and  Basri  [46]  pro¬ 
vided  analytic  bounds  on  the  amount  of  error  in  a  le2ist- 
squares  solution  that  is  used  to  match  four  model  and 
image  points.  This  is  useful  because,  currently,  the  least- 
squares  solution  itself  can  be  found  only  through  itera¬ 
tive  methods. 

For  both  3D  and  2D  objects.  Wells  [47,  48]  used  a 
Bayesian  approach  and  Gaussian  error  assumptions  to 
derive  an  evaluation  function  that  measures  the  likeli¬ 
hood  of  any  given  pose.  Wells  then  used  heuristic  search 


and  gradient  descent  methods  to  find  the  most  probable 
pose. 

Finally,  there  has  been  a  great  deal  of  work  on  find¬ 
ing  a  pose  that  minimizes  error,  when  enough  image  and 
model  features  have  been  matched  to  overdetermine  the 
pose.  Some  of  this  work  analyzes  the  effect  that  errors 
in  image  features  have  on  the  accuracy  of  the  result¬ 
ing  pose,  including  Kumar  and  Hanson  [30]  and  Hel-Or 
and  Werman  [24].  The  work  of  Hel-Or  and  Werman  is 
particularly  relevant  to  us,  because  they  also  consider 
how  error  propagates  through  the  pose  to  the  projec¬ 
tions  of  unmatched  feature  points.  Assuming  Gaussian 
error,  they  use  an  extended  Kalman  filter  to  find  the 
minimal  error  pose  resulting  from  a  match  between  any 
number  of  image  and  model  points.  The  Kalman  filter 
then  allows  them  to  compute  a  Mahalanobis  distance 
that  indicates  the  likelihood  that  error  can  account  for 
the  apparent  deviation  between  a  projected  model  point 
and  a  potentially  matching  image  point. 

In  summary,  there  are  simple  analytic  solutions  for 
how  error  propagates  from  three  matched  image  points, 
when  the  objects  are  two-dimensional  and  undergo 
scaled-orthographic  projection.  This  is  true  both  when 
the  image-point  error  is  bounded  by  circles  and  when  it 
is  normally  distributed.  In  the  case  of  circular  error,  ev¬ 
ery  propagated  uncertainty  region  is  a  circle,  whose  size 
is  independent  of  the  camera  viewpoint. 

For  three-dimensional  objects,  it  appears  empirically 
that  circular  error  again  propagates  to  circular  uncer¬ 
tainty  regions.  Nevertheless,  there  is  no  analytic  solu¬ 
tion,  which  would  be  preferred  for  building  an  efficient 
system.  As  well,  current  numerical  solutions  either  sig¬ 
nificantly  overestimate  the  uncertainty  regions  or  can 
break  down  when  the  two  regions  that  arise  from  the 
two  weak-perspective  solutions  overlap.  Further,  it  is 
not  known  whether  the  uncertainty  regions  are  exactly 
or  approximately  circles,  or  whether  the  sizes  of  the  re¬ 
gions  depend  on  the  viewpoint.  If  the  regions  are  circles 
only  approximately,  one  would  like  to  know  which  config¬ 
urations  of  the  model  and  image  points  cause  the  regions 
to  deviate  from  circularity.  Although  much  progress  has 
been  made  in  understanding  the  effects  of  propagated 
error,  there  are  significant  problems  that  are  not  yet  un¬ 
derstood. 

Finally,  there  have  been  a  number  of  sensitivity  anal¬ 
yses  that  determine  the  susceptibility  of  recognition  sys¬ 
tems  to  false-positive  errors.  Most  of  these  analyses 
are  restricted  to  two-dimensional  objects,  because  this 
is  where  error  propagation  is  most  readily  understood. 
Nonetheless,  there  do  exist  sensitivity  analyses  for  three- 
dimensional  objects,  which  use  numerical  techniques  to 
get  a  handle  on  the  propagated  error. 

2  Fourth-Point  Uncertainty  Region 

In  this  section,  we  address  the  following  problem:  Given 
exactly  three  matching  point  pairs,  (io,7no),  (ii,mi), 
and  (i2,  m2),  where  the  locations  of  io,  ii,  and  12  contain 
small  amounts  of  error,  what  is  the  error  in  the  computed 
image  position  of  a  fourth  model  point,  m3?  This  sec¬ 
tion  presents  an  analytic  solution  to  this  problem,  which 


m 
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Figure  1:  Every  model  point  has  unique  (dt,  T'ii  co¬ 
ordinates,  unless  it  is  on  the  line  through  moi,  where 
Ti  is  unrestricted.  Note  that  =  0  for  points  in  the 
basis  plane. 


is  based  on  a  first-order  approximation,  and  results  in 
a  linear  relationship  between  the  errors  in  the  matched 
(basis)  image  points  and  the  error  in  an  unmatched,  pro¬ 
jected  model  point.  Here  we  consider  weak-perspective 
projection,  and  in  a  later  section  we  extend  the  results 
to  perspective. 

For  weak-perspective  projection,  we  show  that  the  lin¬ 
ear  relationship  takes  a  simple  form  that  can  be  used  to 
predict  the  uncertainty  region  for  an  unmatched  model 
point  when  there  is  bounded  error  or  Gaussian  error  in 
the  image  points.  When  we  allow  the  three  basis  image 
points  to  be  perturbed  within  bounded  error  regions,  the 
resulting  uncertainty  region  is  also  bounded.  When  we 
allow  for  Gaussian  error  in  the  image  points,  the  uncer¬ 
tainty  region  is  a  probability  distribution.  Previously 
these  uncertainty  regions  were  known  analytically  only 
for  planar  objects  [28,  39].  Our  results  have  no  such  re¬ 
striction,  and  they  reduce  to  the  known  solutions  when 
the  model  is  planar.  Furthermore,  when  the  error  in  the 
image  points  is  bounded  by  circles,  the  region  takes  the 
form  of  a  circle  centered  at  the  “nominal  point,”  which 
is  the  point  that  m3  projects  onto  when  there  is  no  error 
in  the  basis  points.  This  result  agrees  with  the  experi¬ 
mental  observations  of  Alter  and  Crimson  [4]. 

2.1  The  Basic  Geometry 

We  begin  by  examining  the  propagated  uncertainty  when 
there  is  error  in  exactly  one  of  three  matched  image 
points,  i2.  To  do  this,  we  introduce  a  particular  rep¬ 
resentation  for  any  third  model  point  that  allows  us  to 
see  how  a  change  in  the  location  of  the  third  image  point 
affects  the  projected  location  of  any  unmatched  model 
point.  In  this  representation,  we  let  the  origin  of  the 
image  coordinate  system  be  at  io,  the  z  direction  be  or¬ 
thogonal  to  the  image  plane,  and  the  x  axis  point  in  the 
same  direction  as  ioi,  where  ioi  =  ii  —  io. 

Furthermore,  we  use  the  following  representation  of 
3D  model  points  in  terms  of  the  three  basis  model  points 
(see  Fig.  1):  Originally,  the  model  points  lie  in  some 
model  coordinate  system.  For  any  model  point  m,,  i  >  0, 


let  ri  be  the  length  of  the  perpendicular  from  rUi  to  the 
infinite  line  containing  Aq  and  mi,  let  di  be  the  distance 
from  mo  to  the  intersection  of  the  perpendicular  and  the 
line,  and  let  r*  be  the  rotation  off  the  basis  plane  (the 
plane  containing  the  three  basis  points). 

A  view  of  a  3D  model  is  determined  by  choosing 
the  six  parameters  of  a  weak-perspective  transformation 
that  will  be  applied  to  the  model.  (There  are  two  pa¬ 
rameters  for  in-plane  translation,  three  for  rotation,  and 
one  for  scale.)  In  this  section  we  have  fixed  iq  and  ii.  By 
fixing  the  locations  in  the  image  where  two  of  the  model 
points  project,  we  have  determined  four  of  the  trans¬ 
formation’s  parameters.  In  particular,  we  initially  can 
rigidly  transform  and  scale  the  model  so  that  mo  =  io, 
mi  =  iij  and  m2  is  in  the  z  =  0  plane — in  so  doing,  the 
model  is  scaled  by 


(6) 


In  order  to  keep  rno  and  mi  projecting  onto  io  and  zi, 
respectively,  there  can  be  no  further  in-plane  translation 
nor  in-plane  rotation.  As  shown  in  Fig.  2,  we  still  are 
free  to  rotate  about  the  y  atxis  as  long  as  mi  continues 
to  project  onto  ii,  which  means  that  any  such  rotation 
about  the  y  axis  determines  the  scale  factor.  After  ro¬ 
tating  about  the  y  axis  and  rescaling,  the  only  remaining 
degree  of  freedom  is  a  rotation  about  the  vector  rnoi. 

Next  we  derive  an  expression  for  the  image  position 
of  m2  as  a  function  of  the  two  free  parameters.  As  illus¬ 
trated  in  Fig.  2,  the  model  is  scaled  by  s,  then  rotated 
by  (f>  about  the  x  axis,  and  then  rotated  by  6  about  the 
y  axis  (denoted  by  This  aligns  the  projections 

of  the  three  model  points  with  their  corresponding  im¬ 
age  points.  As  in  Fig.  1,  we  let  m2’s  coordinates  relative 
to  the  basis  model  points  be  (d2j^2>'r2)  =  (<i,  0) — the 

last  element  is  0  since  m2  is  in  the  basis  plane.  So 


m2  =  R{0,y}(s<i,  sr  cos  srsin^),  where  <j>  G  [0,  27r), 
which  gives 
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=  3  rco3<j>  (7) 

d  sin  -h  r  cos  ^  sin  ^ 

By  our  choice  of  image  coordinate  system,  5  G  [0, 7r/2). 
Project  orthographically: 

mj  =  (x,  y)  =  (sdcos  0,  0)  +  sr(—  sin  ^  sin  (^,  cos  <l>). 

As  mi  rotates  around  the  y  axis,  the  scale  factor  is 
constrained  to  keep  mi  projecting  onto  ii.  From  Fig.  2, 
this  constraint  is  s  ||  moi  ||  cos^  — 1|  ioi  ||,  which  implies 

^  ^  Jl  toi  II  ^  _J0_ 

II  7^1  II  cos  ^  COS^ 

Consequently, 

(x,  y)  =  {sod,  0)  +  5or  (  -  tan  B  sin  (f>,  £21^  j  ^  (9) 

\  cos  U  J 

which  gives  the  image  position  of  m2  as  a  function  of 
the  “out-of-plane”  rotation  angles  B  and  4>,  Fig.  3  shows 
graphically  the  successive  rotations  of  m2  by  B  and  <^, 
followed  by  the  orthographic  projection. 

By  setting  (x,i/)  in  Equation  9  to  the  nominal  loca¬ 
tion  of  i2,  we  could  solve  for  the  nominal  values  of  B  and 
(^.  If  done,  this  would  provide  a  solution  to  the  problem 
of  recovering  3D  pose  from  three  corresponding  points. 
There  are  several  solutions  to  this  problem  already,  how¬ 
ever,  and  instead  we  could  apply  one  of  them  and  then 
use  the  solution  to  compute  B  and  ^  (see  [3]  for  a  review 
of  the  solutions). 


2.2  First-Order  Approximation 


Next  we  allow  for  error  in  one  of  the  basis  points,  12-  The 
problem  is  to  determine  how  changes  as  a  function 
of  ^2,  with  zq  and  ii  remaining  fixed.  In  Section  2.4, 
we  explain  how  to  extend  the  result  to  the  case  where 
all  three  points  can  move.  The  out-of-plane  rotations,  6 
and  (^,  and  the  scale  will  change  as  12  moves  in  the  plane. 
If  the  changes  in  6  and  (j>  are  sufficiently  small,  then  the 
changes  in  x  and  y  as  a  function  of  0  and  (t>  will  be  given 
by  their  first  derivatives.  From  Equation  9, 
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^  =  (if-  H)  “d  4  =  (||.  H)  “e  tangent 
vectors  at  the  point  (x,  y)  to  the  image  curves  that  are 
traced  out  by  changing  9  and  <f>. 


II  II  =  J sin^  <t>  +  sin^  9  cos^  (j)  (14) 

COS^  u  ^ 


II  4  II  =  (15) 

“  ^  '  COS0  V 

For  a  small  change  in  9^  let  ae  represent  the  direction 
in  which  the  image  point  Z2  moves,  measured  counter¬ 
clockwise  from  the  x  axis,  and  similarly  for  <l>  and 
Then 


=  tan  ^(sin  0  cos  —  sin  (^)  (16) 


=  tan  ^(— sin(^,  —  sin^cost^)  (17) 

The  dot  product  of  the  arguments  to  the  inverse  tangent 

is  0,  and  so  the  tangent  vectors  te  and  are  perpen¬ 
dicular.  We  can  use  the  normalized  cross  product  of  the 

arguments  to  see  whether  the  angle  between  te  and 
is  ±90^:  sin(a^  —  a^)  = 

(-  sin  <!>){- sin  <(>)  -  (sin  9  cos  sin  9  cos  <t>) 

II  (— sin0,  sin^cos<^)  ||||  (-sin0cos(^,  -sin(^)  ||  ’ 

which  equals  1.  Thus  =  as  +  90°. 

2.3  Relative  Error  between  the  Third  and 
Fourth  Points 

We  still  are  considering  the  case  where  only  12  moves. 
This  section  shows  that  if  the  projected  third  model 
point  moves  by  a  small  amount,  then  any  projected 
fourth  model  point  moves  by  a  constant  factor  times 


that  small  amount,  and  in  an  analogous  direction.  We 
also  show  that  the  two  weak-perspective  solutions  lead 
to  different  constant  factors. 

Equations  14  and  15  give  the  magnitude  of  the 
changes  in  for  small  changes  in  9  and  (j).  The  only 
differences  for  are  in  the  relative  coordinates  of  the 
model  points  (Fig.  1).  For  m2,  the  coordinates  are 
(r2,d25  7“2)  =  (r,  d,  0).  For  m3,  let  the  coordinates  be 
(r3,d3,r3)  =  (r',d',r').  Consequently,  in  the  expres¬ 
sions,  we  change  r  and  d  — ►  d'.  Further,  9  and  sq 
do  not  change  since  they  are  measured  with  respect  to 
7710  and  rni,  and  do  not  depend  on  whether  we  are  con¬ 
sidering  7712  or  ^713.  However,  (t>  is  the  amount  of  rotation 
of  m2  away  from  the  plane  of  the  wedge  in  Fig.  3.  For 
any  other  model  point,  m3,  the  amount  of  rotation  away 
from  the  wedge  is  given  by  +  r',  since,  according  to 
Fig.  1,  r'  is  the  amount  of  rotation  of  7713  away  from  7712. 
Hence  for  7713,  we  change  +  r'.  All  together,  we 

get 

IK  II  =  +  r')  +  sin^ »cos2(<^  +  r') 

II  II  =  -^^\/sin^(i^  +  r')  +  sin^5cos2(i^  +  r') 
Note  that  IK  II  /  II  *9  11=11  II  /II  U  II  •  Therefore, 

when  either  <t>  or  9  changes,  the  ratio  of  the  size  of  the 
change  in  ttiJ  to  the  size  of  change  in  rnj  equals 

/  /sin^((^  -b  r')  -h  sin^  9  co3^{({>  +  rQ 
r  y  sin^  0  +  sin^  9  cos^  (f> 

Note  that  due  to  9  and  this  expression  depends  on 

the  viewpoint  the  model  is  observed  from,  which  is  not 

true  in  the  planar  case  [28].  In  the  planar  case  r'  =  0, 

because  all  model  points  are  in  the  basis  plane  (Fig.  1), 

/ 

and  Equation  18  reduces  to 

Recall  that  there  are  two  reflective  solutions  to  the 
weak- perspective  geometry  [2].  The  two  solutions  corre¬ 
spond  to  a  reflection  of  the  basis  model  points  about  the 
image  plane.  From  Fig.  3,  this  reflection  corresponds 
to  negating  9  and  <!>.  Plugging  into  Equation  18,  the 
constant  scaling  factors  for  the  two  solutions  are 

r‘  /sin^(o-(^  -i-  r')  H-  sin^(cr^)  cos^(o-0  +  r') 
r  y  sin^(o-(^)  -h  sin^(o’^)  cos^{cr(l>) 
where  cr  =  ±1 

_  r'  /sin^ (cr</>  -f-  r')  -h  sin^  9  co8^(a<f>  +  r^) 
r  Y  sin^  <t>  +  sin^  9  cos^  (j) 

Thus  the  two  weak-perspective  solutions  give  different 
scaling  constants.  Again,  this  differs  from  the  planar 
case,  in  which  the  scaling  constant  in  both  cases  is  y. 
More  generally,  in  the  planar  case  the  two  solutions  col¬ 
lapse  to  one  when  projected  onto  the  image,  and  so  the 
existence  of  two  solutions  makes  no  difference. 

From  Equations  16  and  17, 

aQ  =  tan”^(-sin(<^ -h  r'),sin0cos(<^  +  r'))  (20) 

=  tan“^(-  sin^  cos((^  -h  r'),  -  sin(<^  -h  t'))(21) 


Through  the  same  calculation  that  showed  sin(a^  — 
ao)  —  1,  we  can  calculate  that  sin(a^  —  a^)  =  1,  so 
that 

-  Oi0  ~  ^<i>  ~  ^9  ~  90°.  (22) 

Thus  the  angles  between  the  tangent  vectors  and  their 
relative  sizes  are  the  same  for  and  mj.  As  a  note, 
this  implies  that  the  mapping  between  curves  traced  out 
by  changing  6  and  <f>  for  and  is  conformal  [1]. 

Since  we  are  making  a  first-order  approximation,  any 
movement  of  in  the  image  plane  can  be  viewed  as 
the  sum  of  the  effects  of  changes  in  6  and  <f>.  From 

II  4  II  /  ll_^^  INII  II  /  II  4  II)  we  see  that  for  any 

change  in  mj  some  small  amount,  there  is  a  change 
in  by  that  amount  times  a  constant,  given  by  Equa¬ 
tion  18.  Furthermore,  as  0  changes,  mj  and  mj  each 
moves  in  some  direction,  by  some  amount.  Then  as  <l> 
changes.  Equation  22  implies  that  the  two  points  move 
at  right  angles  to  their  previous  directions.  Hence  any 
change  in  produces  a  change  in  mj  that  is  scaled 
and  rotated  by  fixed  amounts.  Consequently,  any  er¬ 
ror  region  about  the  nominal  position  of  results  in 
a  mathematically  similar  error  region  about  the  nomi¬ 
nal  position  of  ,  which  means  they  are  related  by  an 
image  plane  translation,  rotation,  and  scaling. 

We  can  explicitly  write  the  relationship  between  the 
errors  in  mj  and  using  a  2  x  2  scaled  rotation  matrix 
A.  (A  is  a  similarity  transform  with  zero  translation.  In 
the  sequel  we  will  refer  to  A  interchangeably  as  a  scaled 
rotation  matrix  or  a  similarity  transform.)  A  must  sat¬ 
isfy 

=  Ate  and  =  At^?  (23) 

which  gives  four  equations  in  four  unknowns.  Actually 
only  two  of  the  equations  are  needed:  In  general,  let  an, 
and  022  be  the  elements  of  a  2  x  2  matrix  A. 
Then  for  a  similarity  transform,  021  =  — ai2  and  022  = 
On.  Solving  the  equations  leads  to  (Appendix  A) 


On  = 
012  = 

k  = 


k{cos  T*  —  cos^  S  cos  ^  cos(^  +  r^)) 
—k  sin  r'  sin^. 


r  J  1  —  cos^  ^cos^  (j> 


(24) 

(25) 

(26) 


Note  that  the  constant  scale  5  from  Equation  18  must 
equal  \/a\^  +  0^2  • 


2.4  General  Formula  for  the  Fourth  Point 
Error 

We  can  use  Equations  24-26  to  obtain  a  formula  for  the 
error  in  as  a  function  of  the  error  in  io  or  ii  in  the 
same  way.  This  gives  three  scaled  rotation  matrices  re¬ 
lating  the  individual  errors  in  the  basis  points  to  the 
error  in  mj.  Under  a  first-order  approximation,  these 
errors  affect  the  error  in  independently  and  the  total 
error  in  is  the  sum  of  the  individual  errors.  Let  A 
be  the  scaled  rotation  matrix  between  io  and  mj,  B  be 
the  scaled  rotation  matrix  between  ii  and  rf^,  and  C 
be  the  scaled  rotation  matrix  between  12  and  rnj.  Also 
let  eo)  ^1,  and  €2  be  the  errors  in  the  basis  points,  and, 
for  i  >  3,  let  Ci  be  the  error  vector  from  the  nominal 


0  <  i  <  3  i  ^  3 


Figure  4:  For  each  model  point  rjoi,  there  are  three 
points  of  interest  in  the  image:  (1)  its  nominal  (de¬ 
tected)  position  it,  which  is  determined  by  the  feature 
detector,  (2)  its  no-error  (true)  position  mj,  which 
would  equal  ii  if  there  were  no  error,  and  (3)  its  pre¬ 
dicted  position  mf,  which  is  computed  using  the  first 
three  point  pairs  to  compute  the  pose  and  project  rn^ 
into  the  image.  For  i  <  3,  =  it.  For  any  i,  we 

define  Ci  to  be  the  correction  vector  from  to  m\. 


position  of  to  its  true  position  (see  Fig.  4).  Then  the 
error  in  is  given  by  the  following  linear  relationship: 

63  =  Acq  “f  Bci  -h  Ce2  (27) 

The  two  weak-perspective  pose  solutions,  ±(tf,  (^),  lead 
to  two  possibilities  for  each  of  A,  B,  and  C,  and  for 
the  error  in  the  fourth  point  ^3.  When  we  combine  the 
errors  in  Equation  27,  we  must  be  sure  to  use  the  same 
weak- perspective  solution  for  the  three  matrices. 

Suppose  now  that  the  error  in  each  image  point  is 
bounded  by  some  amount  e.  This  results  in  some 
bounded  uncertainty  region  about  rnj .  From  Section  2.3, 
for  each  image  point  separately  its  e-circle  propagates  to 
a  circle  around  mj  that  is  scaled  by  5  (Equation  18), 
which  gives  a  radius  of  5e.  The  error  in  each  im¬ 
age  point  affects  the  fourth  point  independently,  and 
so  the  uncertainty  region  around  the  fourth  point  is  a 
circle  centered  at  mj,  and  the  three  radii  simply  sum 
together  (where  we  must  be  careful  to  use  the  radii 
from  the  same  weak-perspective  solution).  In  Equa¬ 
tion  27,  let  5o  =  Si  =  \/h\i  +  &i2» 

S2  ~  +  cjj.  Then  the  radius  of  the  uncertainty 

circle  for  is 

R  —  Sq€q  +  Si^i  -(-  5*262  “b  ^3)  (28) 

where  €3  is  added  to  account  for  error  in  sensing  (an 

image  point  that  corresponds  to  rnj). 

When  the  error  in  the  image  points  is  normally 
distributed,  the  linear  relationship  allows  us  to  deter¬ 
mine  the  propagated  uncertainty  distribution  about  any 
fourth  point.  If  the  Gaussian  error  in  the  ith  image  point 
has  standard  deviation  <Jt,  then  the  uncertainty  distri¬ 
bution  about  is  normally  distributed  with  standard 
deviation  (Appendix  C) 

^  +  S?cr?  +  Slal  +  (29) 

In  summary,  we  have  explained  why  Alter  and  Crim¬ 
son  found  that  the  uncertainty  regions  for  three  matched 


points  are  circular.  Moreover,  we  have  provided  a  sim¬ 
ple,  analytic  expression  for  those  uncertainty  regions.  To 
illustrate  the  value  of  this  expression,  we  outline  a  robust 
version  of  Huttenlocher  and  Ullman’s  3D-from-2D  align¬ 
ment  algorithm  [27].  Given  a  model  and  a  cluttered  im¬ 
age  of  a  scene  containing  the  model,  the  following  steps 
are  repeated  until  the  model  is  identified: 

1.  Hypothesize  a  pairing  of  three  model  and  image 
points. 

2.  Using  the  hypothesis,  project  ail  of  the  model  line 
segments  into  the  image. 

3.  Use  Equation  28  to  compute  the  uncertainty  circles 
for  the  two  endpoints  of  every  projected  line  seg¬ 
ment.  Then  construct  a  tight  overestimate  of  every 
line  segment’s  uncertainty  region  from  the  two  un¬ 
certainty  circles,  as  in  Alter  and  Crimson  [4]  (see 

Fig.  5). 

4.  Accept  or  reject  the  hypothesis,  based  on  the  num¬ 
ber  of  line  segments  for  which  there  exist  candidate 
image  segments,  and  on  the  sizes  of  the  uncertainty 
regions  (as  in  [4]).  If  accepted,  return  the  identified 
model  and  pose. 

[4]  demonstrated  that  this  algorithm  is  expected  to  be 
insensitive  to  false  positives  in  cluttered  scenes. 

3  A  Study  of  Uncertainty  from  One 
Basis  Point 

We  showed  in  Section  2  that  for  any  shape  traced  out  by 
the  third  model  point,  the  fourth  model  point  traces  out 
a  mathematically  similar  shape,  up  to  an  approximation. 
This  section  provides  a  study  of  the  true  shape  of  the  un¬ 
certainty  region.  We  begin  by  considering  exactly  how 
larger  changes  in  9  and  effect  the  appearance  of  each 
additional  model  point.  Using  Equation  9,  it  is  straight¬ 
forward  to  see  that,  as  0  changes  with  9  held  constant, 
(x,  y)  traces  out  an  ellipse,  with  center  at  (sod,  0)  and 
with  the  major  axis  parallel  to  the  y  axis.  We  rewrite 
this  equation  as 

(x,y)  =  (sod,  0)  +  (-asin</»,  6cos(^),  (30) 

with  minor  axis  a  =  sprtan^,  major  axis  6  =  sorsec^, 
and  eccentricity  e  =  =  spr.  For  0  =  0,  the 

ellipse  equation  becomes  (x,  y)  ^  (sod,  0)  -h  (0,  so^  cos  <^), 
which  forms  a  line  segment  between  the  points  (sod,  sor) 
and  (sod,  —sor).  As  9  increases  from  0,  the  center  of 
the  ellipse  is  unchanged;  Fig.  6  shows  the  3D  interpreta¬ 
tion.  In  addition,  as  9  increases  both  axes  a  and  h  grow, 


Figure  6:  As  0  changes,  rnpi  is  scaled  so  that  mi  al¬ 
ways  projects  onto  ii.  The  figure  shows  that  any  point 
on  the  line  through  rnoi  always  projects  onto  the  same 
location  in  the  image.  When  (f)  changes,  m2  rotates 
about  a  point  on  this  line,  and  that  point  projects  to 
the  ellipse  center;  therefore  the  ellipse  center  does  not 
change. 


and  the  ratio  a/b  =  sin0  approaches  1.  Consequently 
increasing  0  sweeps  out  a  growing  family  of  concentric 
ellipses  that  become  increasingly  circular. 

As  0  changes,  (ai,y)  traces  out  a  hyperbola,  which  we 
can  see  by  eliminating  0  from  Equation  9.  From  the  x 
and  y  coordinates,  we  have  respectively: 


Using  1  -f  tan^  0  =  sec^  0, 


This  equation  is  a  hyperbola  centered  at  (spd,  0),  with 
focii  (spd,  ztspr),  vertices  (spd,  ispr  cos  (^),  and  asymp¬ 
totes  X  =  ±t/ tan<^  -h  5od.  For  (^  =  0,7r,  the  hyperbola 
becomes  (x,y)  =  (spd,  0)  ±  (0,^),  which  gives  two 
vertically  infinite  half-lines  starting  at  the  focii.  As  4> 
increases,  the  center  and  focii  are  unchanged,  and  the 
asymptotes  rotate  about  (spd,  0),  becoming  increasingly 
parallel  to  the  x  axis.  Consequently,  changing  <t>  from  0 
or  TT  sweeps  out  a  concentric  family  of  hyperbolas  that 
reach  the  x  axis  at  ^ ^ . 

By  a  simple  translation  and  scale  of  (x,  y),  we  get  the 
equation 

(Jf.y)  =  ^((x,y)-(5o,(i)) 

SqV 

=  (- Une  sin  ct>,  .  (32) 

\  cos  0  / 

Fig.  7  plots  families  of  curves  for  changing  <l>  and  0  for 
this  equation.  The  elliptical  curves  are  functions  of  (/>, 
and  the  hyperbolic  curves  are  functions  of  0.  The  four 
plots  show  the  same  figure  at  different  scales  and  for  dif¬ 
ferent  ranges  of  0.  In  left-to- right,  top-to-bottom  order, 
each  plot  is  a  close-up  of  the  center  area  in  the  next  plot. 
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Figure  7:  Plots  of  {X,Y)  for  different  ranges  of  9.  Upper  left:  9  E  [0, 12]  deg.  Upper  right:  9  E  [0,45]  deg.  Lower 
left:  6  E  [0,  60]  deg.  Lower  right:  9  E  [5,  85]  deg.  The  curves  are  separated  by  changes  in  angle  of  4  deg,  5  deg,  6  deg, 
and  10  deg  (in  left-to-right,  top-to-bottom  order). 


The  plots  in  Fig.  7  are  for  translated  and  scaled. 
For  we  get 

SqV' 

=  ('-“tanflsin(<^  +  r'),  — -  \  33) 

This  involves  a  different  scale  factor  and  translation,  but 
otherwise,  the  only  difference  for  the  fourth  model  point 
is  in  t' ,  That  is,  varying  (j)  causes  the  third  and  fourth 
model  points  to  traverse  ellipses  that  may  differ  in  trans¬ 
lation  and  scale,  but  that  do  not  differ  in  shape.  In 
Fig.  7,  we  vary  9  and  (f>  and  show  the  image  regions 
that  the  projected  points  mj  and  mj  occupy  as  a  re¬ 
sult.  Since  we  normalize  the  plots  of  the  two  points,  as 
the  third  model  varies  along  an  ellipse,  the  fourth  model 
point  projects  onto  the  same  ellipse.  If  (X^Y)  moves 
along  the  curves  in  Fig.  7  and  traces  out  a  closed  region, 


then  so  will  (X',y').  The  two  filled-in  arecis  in  each 
plot  illustrate  a  different  pair  of  bounded  error  regions 
that  could  be  traced  out  simultaneously  by  some  (X,  Y) 
and  (X',  y');  (X,  y)  or  (X',  y')  could  be  either  error  re¬ 
gion.  In  this  case,  as  one  point  varies  within  one  of  the 
bounded  error  regions,  the  other  point  will  vary  within 
the  other  error  region,  illustrating  how  uncertainty  prop¬ 
agates. 

In  the  figure,  there  are  two  distinguished  points  where 
the  grid  collapses.  These  points  are  the  hyperbola  focii 
(0,  ±1),  which  were  (sod,  ±5or)  in  the  original  (x,y) 
curve.  These  focii  correspond  to  [9^(p)  =  (0,0)  and 
(0,  tt),  which  occur  when  the  plane  containing  the  ba¬ 
sis  model  points  is  parallel  to  the  image.  From  the  areas 
around  (0, 1)  in  Figs.  7  and  8,  it  is  clear  that  an  (X',  y') 
uncertainty  region  may  not  be  simply  a  scaled  and  ro¬ 
tated  version  of  the  (X,  y)  region.  Therefore,  circular 
error  in  the  third  point  in  general  will  not  produce  cir- 


9 


Figure  8:  Close-up  where  <j>  G  [0,  90]  deg.  In  both  plots,  6  G  [0, 16]  deg  and  the  curves  are  separated  by  changes  in  9 
and  of  8  deg.  _ _ 


cular  uncertainty,  exactly,  in  the  fourth  point.  In  fact, 
we  can  see  from  Fig.  8-left  that  when  9  and  ^  are  small, 
uncertainty  in  the  third  image  point  can  lead  to  strange 
shapes  for  the  propagated  uncertainty  region:  Suppose 
that  (X,  y)  traces  out  the  large  region  on  the  right  (re¬ 
gion  A).  If  r'  =  128 deg,  then  (X',y')  traces  out  the 
similar-looking,  large  region  on  the  left  (region  B).  But 
if  r'  =  72  deg,  then  (X',y')  traces  out  the  non-convex, 
curved  region  on  top  (region  C).  It  should  be  kept  in 
mind  that  the  shapes  not  the  sizes  of  these  regions  are 
what  matters  here,  since  the  plot  is  normalized:  After 
being  scaled  by  the  region  on  top  may  be  significantly 
larger  than  then  the  region  on  the  right,  but  its  unusual 
shape  would  be  unchanged.  As  a  result,  odd-shaped  un¬ 
certainty  regions  can  occur  even  when  there  is  little  error 
in  the  basis  points. 

When  the  third  point’s  error  region  contains  one  of 
the  focii,  then  the  two  uncertainty  regions  for  the  fourth 
point’s  two  weak-perspective  pose  solutions  are  one  and 
the  same;  otherwise  the  regions  will  be  distinct,  although 
they  may  overlap.  As  an  example,  in  Fig.  8-left  suppose 
now  that  region  C  is  traced  out  by  (X,  y).  If  r'  =  72  deg, 
then  the  two  large  regions  correspond  to  the  two  weak- 
perspective  solutions  for  the  (X',  y')  region,  obtained  by 
alternating  the  sign  of  (^,  <t>).  If  r'  =  8  deg,  then  the  two 
uncertainty  regions  for  the  two  solutions  would  overlap, 
but  they  still  would  be  distinct.  On  the  other  hand, 
suppose  that  the  region  traced  by  (X,  y)  additionally 
includes  the  point  (0, 1),  as  in  Fig.  8-right  (region  D). 
Then  when  r'  =  72  deg  the  two  large  regions  merge  into 
the  single,  “H”-shaped  uncertainty  region  shown  in  the 
figure  (region  E).  So  we  see  that  the  similarity  transform 
can  be  a  poor  approximation  for  poses  with  small  values 
of  9  and  (j>  even  with  small  amounts  of  error. 

For  large  9  (bottom,  right  picture  in  Fig.  7),  the  el¬ 


lipses  become  concentric  circles,  and  the  hyperbolas  be¬ 
come  straight  lines.  In  this  circumstance,  any  (X',y') 
region  is  the  same  as  the  corresponding  (X,  y)  region, 
except  for  a  rotation  about  the  origin.  In  this  case,  a 
similarity  transform  will  exactly  relate  the  errors,  and 
will  be  independent  of  the  model  pose  (as  it  was  for  pla¬ 
nar  objects). 

Another  conclusion  we  can  draw  applies  when  the 
third  point  has  a  bounded  error  region  that  is  very  large. 
Suppose  there  is  circular  error  of  radius  e  in  the  third 
point.  As  6  grows,  the  error  circle  will  include  the  point 
where  the  two  pose  solutions  merge,  and  the  boundaries 
of  the  error  circle  and  the  fourth  point’s  propagated  un¬ 
certainty  region  will  reach  the  range  of  9  where  they  are 
related  by  a  similarity  transformation.  For  large  enough 
e,  then,  the  error  in  the  third  point  will  result  in  a  single, 
circular  uncertainty  region  for  the  fourth  point,  regard¬ 
less  of  where  the  image  and  model  points  are  nominally 
located.  This  is  surprising  because  one  might  expect 
that  the  error  incurred  by  using  the  similarity-transform 
approximation  grows  as  e  grows.  Even  though  this  may 
be  true  when  e  is  small,  for  large  enough  c  the  error  will 
decrease. 

Finally,  the  above  discussion  shows  that  the 
similarity-transform  approximation  may  hold  further 
than  the  two  first-order  approximations  that  we  used 
to  compute  it.  To  obtain  an  analytic  expression  for  the 
propagated  uncertainty,  Section  2  took  first-order  ap¬ 
proximations  to  the  errors  in  the  third  and  fourth  points 
in  terms  of  the  3D  pose  parameters,  9  and  (^.  Then 
these  approximations  were  combined  to  get  a  similarity 
transform  that  directly  relates  the  errors  in  the  points. 
This  similarity  transform  may  hold  further  than  the  two 
first-order  approximations.  For  instance,  for  high  val¬ 
ues  of  6,  where  the  similarity  transform  holds  exactly. 
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Figure  9:  For  large  the  ellipses  traced  out  by  chang¬ 
ing  <t>  are  circles.  In  this  case,  changing  (j)  causes 
movements  in  (X,  F)  and  (X',y')  that  are  exactly 
related  by  a  similarity  transform.  First-order  approx¬ 
imations  to  the  movements,  however,  assume  that  the 
two  points  move  along  the  tangent  lines  at  (X,  Y)  and 
(X\y'),  which  is  correct  only  if  the  movements  are 
small. 


the  first-order  approximations  do  not  (see  Fig.  9).  This 
suggests  that  higher-order  error  terms  may  be  cancelling 
when  we  combine  the  two  first-order  approximations  to 
get  a  similarity  transform.  Section  4.3  empirically  shows 
that  the  similarity  transform  can  hold  further  than  the 
first-order  approximation.  This  indicates  that  it  may  be 
wise  to  propagate  errors  directly  in  image  space  rather 
than  from  transformation  space,  as  was  done  for  example 
in  [21]. 


4  Experiments 

We  have  run  three  experiments  to  test  the  results  in  Sec¬ 
tions  2  and  3.  The  first  experiment  compares  our  results 
to  those  of  Alter  and  Crimson  [4],  who  studied  the  case 
of  bounded,  e  error.  In  particular,  we  test  our  formula 
for  the  radius  of  the  fourth  point  error  circle.  The  sec¬ 
ond  experiment  looks  more  generally  at  the  accuracy  of 
the  similarity  transforms  in  Equation  27  for  predicting 
where  the  fourth  point  moves  when  there  is  error  in  the 
basis  points.  This  experiment  is  repeated  for  uniform 
and  Gaussian  error.  The  third  experiment  compares  the 
accuracy  of  the  similarity  transform  with  the  first  order 
approximation  used  to  derive  it. 


4.1  Comparison  to  past  results 


Alter  and  Crimson  assumed  bounded  error  in  the  image 
points  using  circles  of  radius  e.  They  showed  experimen¬ 
tally  that  the  uncertainty  region  for  a  fourth  model  point 
is  closely  approximated  by  a  circle  centered  at  the  nomi¬ 
nal  point,  which  is  the  uncertainty  region  we  derived  an¬ 
alytically  in  Section  2.4.  Following  Alter  and  Crimson, 
we  refer  to  these  fourth-point  error  circles  as  uncertainty 
circles.  Alter  and  Crimson  computed  the  radius  of  an 
uncertainty  circle  by  densely  sampling  the  fourth-point 
uncertainty  region,  and  then  took  the  radius  to  be  the 
maximum  distance  from  the  nominal  point  to  a  sampled 
point.  As  a  result,  the  computed  uncertainty  circle  was 
an  upper  bound  on  the  uncertainty  region,  up  to  the 
fineness  of  the  sampling. 

To  see  how  our  uncertainty  circles  compare  to  Alter 
and  Crimson’s,  we  run  a  series  of  trials  like  those  in  [4]: 
At  each  trial,  we  randomly  generate  three  pairs  of  model 
and  image  points  and  a  set  of  seven  unmatched  model 


11 


points.  We  let  e  =  5  pixels.  For  each  unmatched  model 
point,  we  use  Equation  28  to  compute  our  predicted  ra¬ 
dius  {Rf).  To  compute  the  “maximum  radius”  [Rm] 
in  [4],  we  take  25  samples  along  the  boundary  of  each 
€-circle,  and  then  take  all  triples  between  the  samples 
to  get  15,625  triples.  For  each  triple,  we  solve  for  the 
model  pose  and  use  the  pose  to  project  each  unmatched 
model  point  into  the  image.  This  gives  15,625  projected 
model  points  per  uncertainty  region.  For  each  uncer¬ 
tainty  region,  we  compute  the  maximum  distance  from 
the  nominal  point  to  any  of  its  projected  model  points. 
For  an  error  measure,  we  use  the  relative  error  from  our 
radius  to  the  maximum  radius,  that  is,  . 

We  tested  1,163  uncertainty  circles,  over  which  the 
average  relative  error  was  1.45%.  Table  1  shows  the 
percent  of  uncertainty  circles  for  which  relative  er¬ 
ror  WcLs  less  than  some  threshold.  The  results  show 
that  our  circles  reasonably  approximate  the  maximum- 
radius  circles  from  Alter  and  Crimson,  who  showed  that 
the  maximum-radius  circles  reasonably  approximate  the 
true  regions.  The  results  also  show  that  the  maximum- 
radius  circles  can  at  times  overestimate  our  circles  by  a 
significant  amount.  For  instance,  1.2%  of  the  time  our 
circles  will  be  10%  smaller  than  the  maximum- radius 
circles. 

Table  1  could  be  used  to  determine  a  correction  factor 
for  the  circle  radius:  Suppose  that  a  recognition  system 
has  matched  three  model  and  image  points  and  is  using 
the  analytic  solution  for  the  uncertainty  circles  to  de¬ 
cide  what  region  in  the  image  to  search  for  additional 
matches.  Suppose  further  that  we  want  the  system  to 
conservatively  estimate  the  effects  of  error,  so  that  it  can 
give  some  guarantee  of  no  false  negatives  (i.e.,  that  no 
valid  image  points  will  be  missed)  some  high  percentage 
of  the  time,  while  at  the  same  time  increasing  the  chance 
of  false  positives  as  little  as  possible.  Then,  for  a  chosen 
percent  of  the  uncertainty  circles,  the  percent  relative  er¬ 
ror  in  the  table  tells  lis  by  how  much  to  increase  the  radii 
of  the  circles  so  that  all  points  in  the  true  uncertainty 
region  are  included. 

4.2  Accuracy  of  the  similarity  transform 

Section  2.4  showed  how  we  can  approximate  the  effects 
of  error  in  the  basis  points  using  three  similarity  trans¬ 
forms.  Equation  27  gives  the  error  in  the  fourth  point  as 
a  function  of  the  errors  in  the  basis  points.  The  next  ex¬ 
periment  estimates  how  well  this  approximation  works 
when  the  error  in  the  image  points  is  distributed  uni¬ 
formly,  or  according  to  a  Gaussian.  We  run  a  series  of 
trials  like  those  in  the  previous  experiment,  where  at 
each  trial  we  generate  a  random  model  and  a  random 
image  triple.  Assuming  uniform  error  for  now,  we  then 
uniformly  perturb  each  image  point  within  a  circle  of 
radius  e.  Using  the  perturbed  image  points,  we  com¬ 
pute  the  distances  between  where  the  similarity  trans¬ 
forms  predict  the  fourth  model  point  will  appear  and  its 
actual  projected  location.  For  each  unmatched  model 
point,  there  are  two  such  distances  corresponding  to  the 
two  weak-perspective  solutions.  For  each  trial,  there  are 
seven  unmatched  model  points,  giving  fourteen  distances 
per  trial.  Over  10,000  trials,  140,000  unmatched  model 


points  were  tested. 

For  uniform  error,  the  average  distance  between  the 
similarity  point  (the  point  predicted  by  the  similarity 
transforms)  and  the  true  point  was  .06  pixels.  Table  2 
gives  the  percent  of  unmatched  points  for  which  the  dis¬ 
tance  is  less  than  various  amounts  of  tolerated  error. 
The  results  show  that  the  similarity  transforms  very  ac¬ 
curately  predict  the  movement  of  the  fourth  point,  with 
98%  probability  of  being  in  error  by  less  than  one  pixel. 

For  Gaussian  error,  we  run  the  same  experiment  but 
instead  sample  a  2D  Gaussian  distribution,  using  a  stan¬ 
dard  deviation  of  2.5  pixels  as  was  done  by  Sarachik  and 
Grimson  [39].  We  still  use  a  bound  of  e  =  5  pixels  on 
the  allowed  error  in  the  image  points  to  guarantee  that 
Gaussian  error  will  lead  to  better  results  than  uniform 
error.  In  fact,  we  might  expect  the  results  to  be  sig¬ 
nificantly  better,  since  the  sampled  points  will  tend  to 
contain  less  error.  It  turns  out,  however,  that  the  results 
are  only  slightly  better  than  for  uniform  (see  Table  2), 
with  the  same  average  error  of  .06  pixels.  It  appears  that 
for  Gaussian  error  to  improve  over  simple  uniform  error, 
the  Gaussian  distributions  would  have  to  be  significantly 
more  peaked  around  their  nominal  positions. 

In  conclusion,  the  two  experiments  on  random  model 
and  image  points  indicate  that  the  linear  approxima¬ 
tion  is  reasonably  accurate  for  up  to  five  pixels  of  error. 
Fig.  10  shows  circular  uncertainty  regions  that  we  com¬ 
puted  using  a  real  model  and  image  of  a  telephone.  By 
hand,  we  measured  corners  of  the  telephone  to  obtain 
a  model  and  selected  corresponding  corners  in  the  im¬ 
age.  The  three  smaller  circles  are  the  error  bounds  for 
the  matched  image  points.  The  remaining  circles  are  the 
correct  search  regions  for  finding  additional  matches. 

4.3  Similarity  transform  vs.  a  first-order 
approximation 

We  have  used  two  first-order  approximations.  One  re¬ 
lates  changes  in  changes  in  pose.  The  second 

relates  changes  in  pose  to  changes  in  mj.  When  we  use 
both  approximations,  we  obtain  a  similarity  transform. 

We  will  now  compare  this  to  the  results  of  using  an  ex¬ 
act  determination  of  the  effect  of  changes  in  on  pose, 
along  with  a  first-order  approximation  to  the  subsequent  , 
effect  of  pose  changes  on  mj.  If  the  effects  of  these  two 
first-order  approximations  are  uncorrelated,  we  would 
expect  to  obtain  more  accurate  results  when  we  replace 


one  with  an  exact  value.  On  the  other  hand,  if  error  in 
the  first-order  approximations  tends  to  cancel,  the  sim¬ 
ilarity  transform  that  follows  from  both  approximations 
will  tend  to  be  more  accurate.  Section  3  showed  analyt¬ 
ically  that  this  can  happen  in  some  circumstances.  We 
now  explore  this  possibility  experimentally. 

In  the  same  way  as  in  the  previous  experiment,  we 
generate  trials  of  random  models  and  image  triples,  and 
project  the  random  unmatched  model  points  into  the 
image.  As  in  Section  3,  we  put  error  in  only  one  of  the 
bcisis  points,  say  12*  To  better  compare  the  first-order 
and  similarity  approximations,  at  each  trial  we  generate 
an  error  basis  point  as  follows:  We  change  either  <f>  or 
6  from  its  value  at  the  nominal  pose  until  12  moves  5 
pixels  from  its  nominal  position.  This  gives  us  an  error 
vector  in  the  third  image  point.  For  each  of  the  pro¬ 
jected,  unmatched  model  points  we  compute  the  scaled 
rotation  matrix  A  as  in  Equation  23,  and  apply  it  to  this 
error  vector.  This  gives  the  error  point  predicted  by  the 
similarity  transform. 

To  compute  the  error  point  predicted  by  a  first-order 
approximation  from  pose  space,  we  move  the  projected 

point  along  one  of  the  tangent  lines  or  ,  defined 
in  Section  2.3.  The  amount  we  move  is  determined  by 
which  pose  parameter  we  changed  to  compute  the  error 
in  12.  In  particular,  for  changing  (/>  the  first-order  error 

vector  is  and  for  changing  0  it  is  where 

we  know  A(^  and  A9  exactly  from  our  generation  of  the 
error  basis  point. 

To  measure  propagated  error,  we  first  use  the  error 
basis  point  to  calculate  the  two  possible  locations  where 
the  fourth  point  actually  goes.  To  measure  the  error  in 
the  similarity  transform,  we  calculate  the  distance  from 
the  point  predicted  by  the  similarity  transform  to  the 
closer  of  the  two  actual  points,  and  similarly  for  the  first- 
order  approximation.  Table  3  shows  the  results  from 
histogramming  the  error  data  over  a  series  of  10,000  tri¬ 
als  with  7  projected  points  per  trial.  In  the  table  the 
similarity  transform  is  about  two  to  five  times  more  ac¬ 
curate  than  the  first-order  approximation.  This  suggests 
that  better  results  may  be  obtained  by  propagating  er¬ 
ror  directly  from  the  basis  image  points  to  the  predicted 
locations  of  the  fourth  points.  By  first  estimating  the 
errors  in  pose  space  and  then  propagating  these  errors 
back  to  image  space,  some  accuracy  may  be  lost,  un- 


Figure  10:  Circular  uncertainty  regions  computed  for  points  at  the  corners  of  a  telephone.  The  three  smaller  circles 
show  error  regions  of  five  pixels  about  the  three  points  used  to  generate  the  model  pose.  The  remaining,  larger  circles 
show  uncertainty  circles.  Small  crosses  show  the  actual  locations  of  the  image  points,  which  were  selected  by  hand 
but  are  still  a  bit  noisy.  Small  dots  show  the  projected  locations  of  the  model  points  in  the  determined  pose. 


Absolute  error  (px) 

99%:  Changing  <i>  9 

90%:  Changing  <i>  6 

Similarity  trans. 

0.57  0.51 

0.09  0.09 

Ist-order  approx. 

1.02  1.67 

0.26  0.45 

Table  3:  The  amount  of  error  allowed  in  the  fourth  point  that  includes  99%  or  90%  of  the  tested  points.  The  error  is 
the  distance  from  the  true  location  of  the  fourth  point  to  the  location  predicted  by  the  similarity  transform  (which  is 
the  approximation  we  suggest)  or  the  location  predicted  by  the  first-order  approximation.  In  this  experiment  there 
was  error  in  one  basis  point,  which  was  moved  5  pixels  by  changing  either  (f)  ox  9  from  its  nominal  value. 


less  special  care  is  taken  to  make  sure  the  appropriate 
high-order  error  terms  cancel. 

5  Perspective  Projection 

In  this  section  we  apply  the  same  technique  to  perspec¬ 
tive  projection  to  obtain  again  a  linear  relationship  be¬ 
tween  the  errors,  but  in  this  case  the  form  of  the  linear 
relationship  is  much  more  complicated.  As  before,  we 
begin  by  looking  at  the  effects  of  error  in  exactly  one  of 
the  basis  points.  We  assume  we  know  the  camera  focal 
length  /  and  the  center  of  projection  c. 

We  introduce  a  similar  representation  for  any  third 
model  point  to  the  one  we  used  in  the  scaled- 
orthographic  case.  Since  iq  and  ii  are  fixed,  the  line 
segment  between  mo  and  mi  is  free  to  rotate  and  trans¬ 
late  in  the  plane  through  (c,  zo,ii),  as  long  as  rno  and 
mi  remain  on  the  lines  through  (c,  io)  and  (c,  ii),  respec¬ 
tively  (see  Fig.  11).  After  this  rotation  and  translation, 
the  only  remaining  degree  of  freedom  is  a  rotation  about 
the  line  through  (mo,  mi). 

Initially,  we  rigidly  transform  the  model  so  that  = 

7^1  points  in  the  same  direction  as  ioi,  and  m2  is  in 
the  2  =  0  plane.  For  the  image  coordinate  system,  we 
let  the  origin  be  at  io,  the  z  direction  be  orthogonal  to 
the  image  plane,  and  the  positive  x  axis  be  along  ioi. 
Also,  we  let  n  be  the  unit  vector  that  is  normal  to  the 
plane  through  (c,  io,ii).  Next  the  model  is  rotated  by 
<f>  about  the  x  axis,  then  rotated  by  9  about  n  (denoted 


by  and  then  translated  by  u.  Let  p  equal  the 

second  model  point  after  the  rotation.  Then  using  the 
relative  coordinates  for  the  model  points  from  Fig.  1, 
p  (d,  r  cos  (j),  r  sin  (f>)^  so  that 

m2  =  U  +  R-{e,^}P-  (34) 

For  the  translation  il,  let  L  be  the  distance  from  c 
to  rno,  and  let  v  be  the  unit  vector  pointing  from  c  to 
iQ  zz  (0,0,0),  which  implies  v  =  — c/  ||  c  ||.  Then  u  = 
c+  LF,  where  L  must  still  be  determined.  In  Fig.  11, 
let  '0  be  the  angle  between  c  —  io  and  ii  —  iq,  and  let 
^01  be  the  angle  between  io  —  c  and  ii  -  c.  Also  let  Rqi 
be  the  distance  betweeen  mo  and  mj.  Through  some 
trigonometry,  Appendix  B  computes  that 

L  -  sin(0  d-V'  +^01)  (  .  (35) 

\sin  6^01/ 

Let  77X2  =  (®>5>^)*  Since  the  image  coordinate  sys¬ 
tem  is  based  at  xq  with  the  camera  center  point  at 
c  =  (cjE,  Cy,  -/),  projecting  7^2  into  the  image  gives 

=  (a:>y)  =  (/|q;;y +Cx,  +  cy)  .  (36) 

From  this  equation,  we  can  compute  the  partial  deriva¬ 
tives  of  X  and  y  with  respect  9  and  0,  which  give  te 
and  By  substituting  r'  for  r,  d'  for  d,  and  0  +  r' 

for  0,  we  can  analogously  compute  and  .  Then  we 
can  solve  Equation  23  for  A,  and  we  will  get  a  linear 
transform  relating  the  error  in  X2  to  the  error  in  ttxJ  (see 
Appendix  B):  Let  =  ^,  =  ||,  and  similarly  for 

250,  3/0,  1/9^  . 

^  _  1  r  y<i>xe  -  -250®'^  +  25^*0  ' 

2Jdy0  -  x^ye  y^Ve  —  yoy^i^  — 2J^t4  + 

(37) 

If  there  is  bounded,  circular  error  in  12,  the  linear 
transform  will  produce  an  elliptical  uncertainty  region 
for  TnJ,  whose  parameters  could  be  determined  analyt¬ 
ically.  This  differs  from  the  scaled-orthographic  case, 
where  the  uncertainty  region  is  a  disc.  To  handle  circu¬ 
lar  error  in  all  three  points  for  scaled-orthographic  pro¬ 
jection,  we  had  to  convolve  together  three  circles.  For 
perspective,  we  would  have  to  convolve  three  ellipses. 
To  handle  bounded  error  in  three  basis  points  using  per¬ 
spective  projection,  we  can  apply  the  algorithm  that  will 
be  proposed  in  Section  6. 

For  Gaussian  error,  on  the  other  hand,  the  method 
in  Appendix  C  applies  equally  well  to  linear  transforms 
as  to  similarity  transforms.  As  before,  we  get  an  an¬ 
alytic  solution  for  the  uncertainty  region  of  a  projected 
model  point,  and  that  uncertainty  region  is  normally  dis¬ 
tributed.  The  only  difference  is  that  the  normal  distri¬ 
bution  need  not  be  circularly  symmetric. 

6  nth-Point  Uncertainty  Region 

It  has  been  shown  [21]  that  recognition  algorithms  that 
use  a  small  number  of  randomly-matched  points  to  de¬ 
termine  pose  are  sensitive  to  false  positive  identifications 


of  objects,  because  these  poses  are  not  sufficiently  stable 
and  lead  to  large  uncertainty  regions.  One  solution  is  to 
use  poses  based  on  more  information  to  derive  smaller 
uncertainty  regions.  Assuming  a  bounded  error  model, 
this  section  shows  how  to  compute  the  uncertainty  region 
of  an  7x  +  I’st  model  point  given  n  matched  model  and 
image  points,  for  any  value  of  tx.  Our  linearized  models 
of  projection  allow  us  to  determine  linear  constraints  on 
the  set  of  feasible  error  vectors  consistent  with  a  match 
between  model  and  image  points.  Having  expressed  our 
knowledge  about  pose  using  linear  constraints,  we  ap¬ 
ply  linear  programming  to  optimize  a  set  of  objective 
functions  whose  solution  tightly  bounds  the  uncertainty 
region.  In  general,  this  technique  applies  to  any  linear 
projection  model,  including  affine  models  and  including 
the  linearized  perspective  and  weak-perspective  models 
that  we  derived  in  this  paper. 

To  demonstrate  the  idea,  we  suppose  that  the  error 
in  each  image  point  is  bounded  by  a  square  of  width 
2e.  We  emphasize,  however,  that  the  same  reasoning 
will  apply  to  any  convex,  polygonal  error  bound,  so  that 
we  may  approximate  a  circle,  or  any  other  convex  error 
bound,  as  closely  as  we  wish.  With  square  error  bounds, 
a  match  between  image  and  model  points  can  be  con¬ 
sistent  only  if  there  exists  a  pose  that  brings  the  x-  and 
y-coordinates  of  all  matched  points  to  within  e  pixels  of 
each  other.  Let  mj'  =  (xf,yf)  be  the  projection  of  the 
i’th  matched  model  point,  in  some  nominal  pose.  Let 
Xt  =  (25t,3/t)  be  the  location  of  the  corresponding  image 
point.  Also,  let  Ci  =  (2®,  y®)  be  a  vector  representing  the 
deviation  between  a  model  point’s  projected  position  in 
the  nominal  pose  and  its  true  position  (see  Fig.  4).  Since 
we  choose  a  pose  by  aligning  the  first  three  model  points 
with  image  points,  for  0  <  x  <  2  this  deviation  is  also  the 
actual  error  that  occurred  in  sensing  the  image  points. 
For  X  >  3.  ei  does  not  depend  on  where  we  have  sensed 
the  x’th  image  point,  but  rather  is  a  function  of  the  sens¬ 
ing  error  in  the  first  three  image  points.  We  then  model 
error  by  assuming  a  model  point’s  true  position,  rnf  -he,;, 
is  within  e  of  its  corresponding  image  point,  x^.  That  is, 
*<-«<<  +  »,■  <  +  e  (38) 

Vi- Vi  +  Vi  <  W  +  e  (39) 

Previously,  we  used  the  matrices  A,  B,  C  to  represent 
linear  transformations  between  error  in  the  first  three 
points  and  the  fourth  point.  We  now  let  Aj,  Bj,  Cj  be 
the  corresponding  matrices  for  the  x  +  I’st  point  (e.g., 

A3  =  A).  We  let  a\  and  a\  be  the  first  and  second 

rows  of  the  matrix  Aj,  and  define  b} ,  ,  c}  ,  similarly. 

In  both  the  cases  of  scaled-orthographic  and  perspective 
projections,  we  know  that  we  can  write 

ei  =  (al-e-Q,  o- •eo)+(fei  -ei,  b}  •ei)+(ci  62,  cj  -62),  (40) 
for  3  <  i  <  TX  —  1.  Consequently,  for  matched  points  3 
through  TX  —  1,  we  may  substitute  a  linear  combination 
of  the  first  three  error  vectors  for  e*,  giving  us  additional 
constraints  that  the  first  three  points  must  meet  in  or¬ 
der  to  lead  to  a  consistent  solution.  In  all,  we  get  the 
following  constraints:  For  x  E  [0,  2], 

-e  <  X-  <6,  -€  <  yl  <  c. 


and  for  i  G  [3,  n  —  1], 

Xi  -  €  <  xf  +  0,^  •  6*0  +  b-  ■  ei  +  c-  •  62  <  asi  +  f, 

Vi  -  ^  <  Vi  +  of  •  60  +  bi  ■  ei  +  c?  ■e2<yi  +  €. 

This  set  of  constraints  can  be  satisfied  if  and  only  if 
there  is  a  set  of  error  vectors  for  the  first  three  points 
that  will  bring  all  projected  model  points  to  within  the 
error  bounds  of  their  matching  image  points. 

We  have  formulated  our  knowledge  of  model  pose, 
based  on  a  match  between  n  image  and  model  points, 
in  terms  of  linear  constraints  on  the  components  of  the 
three  error  vectors  and  ^2-  We  may  now  use  linear 

programming  to  maximize  any  linear  objective  function, 
subject  to  these  constraints.  In  particular,  we  can  for¬ 
mulate  linear  objective  functions  that  express,  for  several 
directions,  the  errors  in  an  additional  model  point’s  pre¬ 
dicted  position,  and  then  extremize  these  functions.  For 
example,  if  we  maximize  the  linear  objective  function 

*  ei  -I-  •  62,  (41) 

we  will  find  the  maximum  x  displacement  that  the  pro¬ 
jection  of  the  n  +  I’st  model  point  can  have  from  its 
nominal  position.  By  maximizing  the  negation  of  this 
expression,  and  similar  expressions  for  the  y  values,  we 
may  put  a  rectangle  about  the  possible  locations  of  the 
n  4-  Tst  point. 

Using  a  similar  method,  we  can  in  general  place  a 
convex  polygon  of  any  shape  about  the  possible  locations 
of  the  n  -h  I’st  point.  Suppose  we  wish  to  bound  the 
location  of  the  n  +  I’st  point  in  some  direction  other 
than  along  the  x  oi  y  axis.  Let  (Ax,  Ay)  be  a  vector  in 
that  direction.  Then  we  may  achieve  this  by  maximizing 
the  objective  function  formed  by  taking  the  dot  product 
of  (Ax,  Ay)  and  By  substituting  our  expression  for 
eji  as  a  linear  combination  of  the  first  three  error  vectors, 
we  get  a  linear  expression  in  these  values.  By  finding  the 
extreme  values  of  the  feasible  positions  of  the  n  -|-  I’st 
model  point,  we  may  put  a  convex  polygon  about  these 
positions  which  will  be  more  accurate  than  a  square. 

We  should  note  that  linear  programming  is  very  ef¬ 
ficient.  It  is  known  to  be  polynomial  time  in  the 
worst  case.  In  practice,  for  problems  with  I  variables 
and  m  constraints,  the  most  common  algorithm,  sim¬ 
plex,  is  found  to  usually  take  time  proportional  to  Im? 
(Strang  [41]),  and  many  highly  optimized  commercial 
implementations  of  simplex  exist.  Our  problem  has  6 
variables  and  4n  constraints  when  n  points  are  matched. 
When  the  number  of  variables  in  a  problem  is  fixed  and 
only  the  number  of  constraints  grows,  as  in  our  case, 
there  are  algorithms  that  take  linear  expected  time  (see 
Seidel  [40],  for  example). 

When  the  errors  in  the  image  points  are  Gaussian  dis¬ 
tributed  and  there  are  more  than  three  matched  points, 
the  Kalman  filter  can  be  used  to  recursively  compute 
Gaussian  distributions  for  the  error  vectors  ^o,  and 
62,  similar  to  Hel-Or  and  Werman  [24].  Given  Gaus¬ 
sian  distributions  for  eo,  ei,  and  62,  Appendix  C  shows 
how  to  obtain  a  Gaussian  distribution  for  the  possible 
locations  of  any  n  H-  I’st  point. 


These  methods  could  be  quite  useful  to  the  indexing  or 
alignment  approaches  to  recognition  that  we  previously 
described.  To  illustrate,  a  recognition  system  that  uses 
the  linear  programming  method  might  work  as  follows: 

1.  Match  k  image  and  model  points,  using  a  search  or 
indexing  method.  Assume  that  the  error  in  each 
image  point  is  bounded  by  some  m-sided  polygon. 

2.  Use  the  matches  to  generate  km  linear  constraints 
on  the  possible  errors  in  the  first  three  matched 
points. 

3.  For  each  unmatched  model  point,  run  I  linear  pro¬ 
grams  to  compute  an  /-gon  bounding  the  point’s 
possible  image  locations.  Look  there  for  a  match¬ 
ing  image  point. 

Recognition  systems  commonly  use  more  complex  fea¬ 
tures  such  as  line  segments  for  verification,  instead  of 
points.  It  would  be  straightforward  to  use  our  results  to 
bound  the  uncertainty  regions  of  line  segments  by  find¬ 
ing  the  uncertainty  regions  of  their  endpoints  (as  in  [4]). 
Additionally,  our  results  allow  us  to  measure  experimen¬ 
tally  the  extent  to  which  additional  point  matches  de¬ 
crease  our  uncertainty  about  the  location  of  unmatched 
points.  We  discuss  this  in  the  next  section. 

7  Experiments 

To  demonstrate  the  value  of  narrowing  the  feasible  region 
in  which  model  points  may  appear  by  using  additional 
matches,  we  have  implemented  a  test  system  for  the  case 
of  weak-perspective  projection.  In  this  system,  we  match 
some  image  and  model  points.  We  then  examine  the 
regions  in  which  additional  points  might  appear.  We 
can  see  how  much  smaller  these  feasible  regions  become 
as  we  derive  additional  constraints  from  more  matches. 

We  first  describe  the  results  of  this  system  on  syn¬ 
thetic  data.  This  allows  us  to  systematically  explore  two 
key  issues.  First,  since  our  linear  transformation  is  only 
an  approximation,  how  often  might  it  cause  us  to  make 
errors?  Second,  how  much  can  the  addition  of  further 
matches  reduce  the  space  in  which  we  must  search  for 
even  more  matches? 

We  use  the  following  experimental  conditions.  First, 
we  generate  random  sets  of  seven  model  points  inside 
a  cube.  We  form  an  image  by  projecting  these  points 
orthographically,  scaling  so  that  the  cube  projects  to  a 
1000  X  1000  square  image.  We  then  add  error  so  that 
each  sensed  point  shows  up  inside  a  circle  of  radius  five 
pixels  centered  at  the  projected  position  of  the  point. 
For  error,  a  uniform  random  distribution  is  used. 

We  then  match  the  first  three  noisy  image  points  and 
model  points,  and  use  this  match  to  generate  a  noisy 
pose  of  the  model.  This  pose  is  then  used  to  compute 
the  linear  transformations  describing  the  location  of  each 
additional  model  point  as  a  function  of  the  error  in  the 
first  three  image  points.  Of  the  two  possible  model  poses 
that  can  be  derived  using  a  match  of  three  points,  we 
automatically  select  the  one  that  is  closer  to  the  cor¬ 
rect  pose.  In  a  real  system,  of  course,  we  would  have 
to  explore  both  possibilities,  and  see  which  led  to  more 
confirming  evidence. 
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Table  4:  This  shows  the  size  of  error  regions  computed  as  more  points  are  matched,  and  the  frequency  with  which 
noisy  model  points  fail  to  appear  in  these  error  regions.  Image  points  are  always  perturbed  by  a  uniform  error 
bounded  by  five  pixels.  The  first  column  gives  half  the  width  of  the  square  error  bound  that  we  allow  for  in  each 
image  point.  The  second  column  gives  the  number  of  matches  used  in  computing  the  error  region  for  an  additional 
point.  The  third  column  gives  the  average  size  of  this  error  region,  and  the  fourth  column  gives  the  percentage  of 
times  that  a  model  point  image  shows  up  in  this  predicted  error  region. _ _ 


Given  these  linear  transformations,  we  allow  for  a 
square  error  bound  of  width  10  pixels  around  each  error 
circle.  As  described  above,  we  then  compute  a  rectangu¬ 
lar  bound  on  the  image  location  of  each  additional  model 
point  using  linear  programming.  Next  we  check  to  see 
which  image  points  actually  appear  within  the  predicted 
rectangular  boundary.  Since  we  have  perturbed  the  im¬ 
age  points  within  their  error  circles,  any  time  we  fail  to 
find  a  model’s  image  point  in  the  predicted  rectangle, 
this  mistake  must  be  due  to  limitations  in  our  linear  ap¬ 
proximation.  When  we  do  find  an  image  point,  we  record 
the  size  of  the  rectangle  in  which  we  looked. 

We  then  augment  our  hypothesis  by  matching  the 
fourth  image  and  model  points,  and,  using  the  additional 
constraints,  we  further  narrow  down  the  location  of  the 
remaining  model  points  in  the  image.  Again  we  keep 
track  of  how  often  we  fail  to  find  a  model  point  in  a 
predicted  rectangle,  and  we  record  the  areas  of  the  suc¬ 
cessful  rectangles.  We  continue  this  process  with  addi¬ 
tional  matched  points.  We  repeat  this  experiment  2,  500 
times,  continuing  to  perturb  each  image  point  by  up  to 
five  pixels,  but  allowing  for  varying  levels  of  error  in  our 
predictions.  We  can  use  these  results  to  see  how  many 
mistakes  of  the  system  could  be  eliminated  by  overesti¬ 


mating  the  expected  error,  and  how  much  of  a  price  we 
would  pay  for  this  by  producing  larger  rectangles. 

Table  4  lists  the  results.  There  are  several  conclu¬ 
sions  we  may  draw.  First,  we  see  that  few  overall  mis¬ 
takes  are  made.  The  predictions  are  generally  between 
95%  and  99%  accurate.  The  significance  of  this  will  de¬ 
pend  on  exactly  how  we  incorporate  these  error  regions 
into  a  recognition  algorithm.  But  typically,  recognition 
systems  search  through  many  hypothetical  matches  be¬ 
tween  image  and  model  points,  and  it  is  understood  that 
a  system  may  have  to  consider  more  than  one  correct 
hypothesis  before  recognizing  an  object.  This  is  because 
even  a  correct  set  of  matches  may  lead  to  an  inaccu¬ 
rate  pose.  We  can  quantitatively  see  that  our  method 
of  computing  error  regions  leads  to  few  such  unstable 
poses. 

Second,  we  can  see  that  additional  matches  do  pro¬ 
vide  considerable  extra  constraint  in  determining  the  lo¬ 
cations  of  unmatched  points.  The  most  dramatic  effect 
occurs  when  one  matches  a  fourth  point.  This  can  reduce 
the  size  of  possible  error  regions  by  a  factor  of  fifteen  or 
more.  But  even  after  the  fourth  point,  there  is  a  con¬ 
tinuing  significant  benefit  in  using  additional  matches  to 
constrain  the  error  regions.  These  results  also  help  us 
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to  in  general  assess  the  stability  of  poses  generated  by 
a  small  number  of  feature  matches.  We  can  see  that  if 
we  use  three  points  to  compute  a  pose,  small  changes  in 
these  points  can  result  in  large  changes  in  the  locations 
of  additional  points.  Poses  computed  from  more  points 
would  be  much  more  stable. 

The  error  regions  we  compute  are  in  general  quite 
large.  Several  factors,  however,  may  have  exaggerated 
the  sizes  of  the  error  regions.  First,  if  we  truly  wish  to 
consider  error  as  bounded  within  a  disc  we  should  use 
polygonal  error  regions  that  more  closely  approximate 
a  circle.  Rectangles  were  used  in  this  example  only  for 
the  sake  of  simplicity.  Second,  a  uniform  error  distribu¬ 
tion  bounded  by  five  pixels  may  be  pessimistic.  In  real 
systems,  sensed  image  points  probably  tend  to  cluster 
around  the  point’s  true,  error-free  position,  and  the  er¬ 
ror  may  well  be  less  than  five  pixels.  Therefore  a  system 
that  allowed  for  less  error  may  produce  much  smaller  er¬ 
ror  regions,  without  making  many  mistakes.  Of  course, 
allowing  for  less  error  should  only  make  our  linear  ap¬ 
proximation  more  accurate. 

It  is  also  interesting  to  note  that  the  accuracy  of  the 
error  regions  drops  a  bit  as  we  add  more  point  matches. 
It  seems  that  errors  in  the  linear  approximation  accumu¬ 
late  as  we  compute  the  feasible  set  of  error  vectors.  One 
way  to  compensate  for  this  effect  would  be  to  allow  for 
slightly  more  error  as  we  use  more  matched  points.  For 
example,  if  we  allow  for  5  pixels  of  error  when  we  match 
three  points,  we  might  allow  for  6  pixels  when  matching 
four  points.  This  would  allow  us  to  significantly  reduce 
the  size  of  the  error  regions,  while  keeping  the  error  rate 
essentially  constant. 

As  before,  we  have  run  this  system  on  a  real  model  and 
image  to  further  illustrate  its  performance.  Fig.  12  shows 
the  resulting  rectangular  error  regions  for  e  =  5.25.  The 
figure  demonstrates  how  the  uncertainty  regions  shrink 
as  we  match  more  points,  while  still  containing  the  true 
image  points. 

In  summary,  we  have  used  linear  programming  to 
compute  the  propagated  uncertainty  regions  in  simula¬ 
tion  and  in  a  real  image  for  matches  with  more  than  three 
model  and  image  points.  The  experiments  demonstrate 
that  additional  matched  points  can  significantly  reduce 
the  uncertainty  regions  with  little  loss  in  accuracy. 


8  Applications 


We  have  shown  how  to  approximate  the  effect  of  changes 
in  model  pose  using  a  linear  relationship  between  the  er¬ 
ror  vectors.  For  predicting  the  locations  of  unmatched 
points,  we  have  demonstrated  that  this  approximation  is 
quite  good  within  the  range  of  error  usually  considered 
by  object  recognition  systems.  This  suggests  that  for 
many  recognition  applications  we  may  model  this  rela¬ 
tionship  linearly. 

In  past  research,  the  use  of  linear  projection  models 
has  led  to  algorithmic  simplicity.  Projection  of  a  3D 
object  may  be  approximated  as  a  3D-to-2D  affine  trans¬ 
formation  to  gain  the  advantages  of  linearity,  at  the  loss 
of  fully  capturing  the  rigidity  of  objects.  Also,  scaled- 
orthographic  projection  of  a  planar  object  is  equivalent 
to  a  2D  affine  transformation  of  the  object,  which  is  lin¬ 
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ear.  Many  algorithms  have  taken  advantage  of  this  lin¬ 
earity,  either  to  find  matches  that  are  consistent  with  a 
bounded  error  model  (e.g.,  [6,  12,  10,  28])  or  to  find 
likely  sets  of  matches  assuming  Gaussian  error  (e.g., 
[36,  39,  47,  7]).  We  can  now  extend  some  of  these  algo¬ 
rithms  to  full  3D-from-2D  recognition  while  maintaining 
object  rigidity. 

In  Section  6,  we  outlined  an  algorithm  which  could 
be  useful  to  most  alignment  and  indexing  approaches 
to  recognition.  Some  alignment  approaches  use  group¬ 
ing  methods  to  generate  an  initial  match  of  more  than 
three  points  (such  as  Lowe’s  [34],  Roberts’  [37],  Ja¬ 
cobs’  [29],  and  Wayner’s  [45]),  and  some  alignment  ap¬ 
proaches  create  an  initial  alignment  using  only  three 
points  [37,  17,  34,  27,  44].  In  the  latter  case,  a  recog¬ 
nition  system  might  attempt  to  add  matches,  and  use 
these  additional  matches  to  narrow  the  area  in  which 
it  must  search  for  still  more  consistent  matches.  Addi¬ 
tionally,  the  algorithm  from  Section  6  may  be  useful  in 
methods  that  match  image  to  model  features  by  index¬ 
ing,  and  then  verify  these  matches  [32,  14,  29,  43,  38,  45]. 
In  these  approaches,  some  model  features  are  matched 
to  image  features  to  determine  a  model  pose,  and  then 
this  pose  is  used  to  find  matches  for  additional  model 
features.  Our  results  show  exactly  where  to  search  for 
these  matches  when  we  have  matched  three  image  and 
model  points. 

As  mentioned  above,  other  approaches  to  recognition 
have  derived  linear  constraints  on  model  poses  using  lin¬ 
ear  projection  models.  The  linear  constraints  were  used 
to  robustly  match  models  and  images  in  the  presence  of 
bounded  uncertainty.  This  line  of  work  originated  with 
Baird  [6],  who  considered  models  of  2D  points  under¬ 
going  2D  rotation,  translation,  and  scaling.  Baird  used 
convex  polygons  to  bound  the  errors  in  the  image  points. 
He  then  showed  that,  when  we  match  an  image  point  to 
a  model  point,  each  side  of  the  polygon  places  a  lin¬ 
ear  constraint  on  the  set  of  feasible  model  poses,  if  the 
transformation  from  matched  image  points  to  projected, 
unmatched  model  points  is  linear. 

Baird  used  these  constraints  as  part  of  an 
interpretation-tree  approach  to  recognition.  His  sys¬ 
tem  searched  a  tree  that  represented  all  possible  ways 
of  matching  image  and  model  points.  At  each  node  of 
the  tree,  linear  programming  Wcis  used  to  decide  whether 
the  proposed  matches  were  consistent  with  the  polygo¬ 
nal  error  bounds.  In  Section  6,  we  went  beyond  Baird, 
not  only  in  handling  the  scaled-orthographic  projection 
of  3D  objects,  but  also  in  showing  how  to  use  linear  pro¬ 
gramming  to  find  the  uncertainty  regions  of  unmatched 
points. 

Breuel  [10]  used  a  modification  of  Baird’s  approach 
to  produce  a  tree-search  algorithm  that  in  the  worst 
case  runs  in  polynomial  time.  Cass  [12]  used  linear  con¬ 
straints  to  show  that  finding  the  pose  of  the  model  that 
aligns  the  most  image  and  model  features  to  within  er¬ 
ror  bounds  is  inherently  a  polynomial  time  problem.  Ja¬ 
cobs  [28]  showed  how  to  perform  a  Hough  transform  in 
error  space,  instead  of  model  pose  space,  by  discretely 
computing  the  feasible  region  in  error  space  (the  space 
formed  by  the  cross  product  of  the  error  vectors  in  the 


Figure  12:  Rectangular  uncertainty  regions  from  matching  more  points.  The  rectangles  in  the  top  left  image  were 
computed  after  matching  only  three  points.  The  following  images  show  the  rectangles  that  resulted  from  successively 
matching  one  more  point.  When  an  additional  point  was  matched,  we  stopped  computing  rectangles  for  that  point. 


first  three  image  points). 

Jacobs  makes  use  of  a  linear  relationship  between  er¬ 
ror  vectors  that  exists  for  the  case  of  affine  transfor¬ 
mations  of  2D  objects.  Our  linearized  perspective  and 
weak-perspective  models  give  us  a  linear  relationship  for 
3D  objects  as  well.  As  a  consequence,  Jacobs’  method 
readily  can  be  extended  to  3D  objects,  and  without  in¬ 
creasing  in  the  dimensionality  of  the  problem. 

In  addition  to  Jacobs’  method,  our  linear  relationship 
can  be  used  to  extend  any  of  the  above  methods.  To  il¬ 
lustrate,  we  extend  Cass’  approach  to  the  case  of  scaled- 
orthographic  projection.  Suppose  we  match  three  image 
to  three  model  points  and  wish  to  know  which  pose  will 
match  the  most  additional  model  and  image  points.  We 
know  that  these  three  matches  give  us  simple  linear  con¬ 
straints  on  the  first  three  error  vectors,  just  from  the 
image  points’  error  bounds.  Now  match  each  additional 
model  point  to  each  additional  image  point.  These  give 
us  more  linear  constraints.  Each  linear  constraint  de¬ 
scribes  a  5D  hyperplane  in  a  6D  space  of  the  possible  er¬ 
ror  values:  e{o,x},  e{o,y},  If  we 

take  each  set  of  six  linear  constraints,  the  constraints  in 
general  will  intersect  at  a  point.  This  point  corresponds 
to  a  set  of  error  vectors  for  the  first  three  points,  and 
hence  to  a  possible  pose  of  the  object.  As  Cass  showed, 
if  we  now  consider  all  of  these  poses,  we  will  be  guaran¬ 
teed  to  find  one  that  matches  the  most  model  points  to 
image  points.  In  fact  we  will  find  all  poses  that  match 


the  model  to  different  collections  of  image  points. 

Cass  has  developed  efficient  heuristic  algorithms  for 
exploring  the  space  of  poses  in  the  case  of  a  rigid  2D 
rotation  and  translation.  For  3D  recognition,  the  al¬ 
gorithm  becomes  costly,  however.  In  our  case,  if  we 
have  m  model  points  and  n  image  points,  for  each  of 
the  O(m^n^)  initial  matches  we  must  consider  0(m®n®) 
poses.  Hopefully,  however,  a  polynomial  time  formula¬ 
tion  of  matching  for  scaled-orthographic  projection  may 
lead  to  more  efficient  heuristic  solutions,  such  as  the  ones 
that  Cass  has  found  in  other  domains.  Alternately,  Ja¬ 
cobs’  approach  of  performing  a  Hough  transform  in  er¬ 
ror  space  may  be  more  effective  since  error  space  is  more 
compact. 

9  Conclusion 

This  work  will  allow  recognition  systems  to  accurately 
take  account  of  the  effects  of  sensing  error,  during  a 
process  that  finds  supporting  evidence  to  confirm  a  hy¬ 
pothetical  set  of  matches.  We  showed  that  a  linear 
approximation  to  scaled-orthographic  projection  is  ac¬ 
curate  when  reasonable  amounts  of  sensing  error  have 
occurred.  In  addition,  we  showed  how  to  compute  the 
propagated  uncertainty  regions  for  the  rigid  projection 
of  a  3D  model  into  a  2D  image.  The  uncertainty  re¬ 
gions  for  three  matched  points  are  described  by  a  sim¬ 
ple  analytic  expression,  for  the  projection  and  error 
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Uncertainty  Region  Solution 

3  matched  points 

>  3  matched  points 

Scaled-orthographic,  Gaussian 
Scaled-orthographic,  Bounded 
Perspective,  Gaussian 
Perspective,  Bounded 

Circularly-symmetric  Gaussian 
Circle 

Gaussian 

Linear  Programming 

Gaussian 

Linear  Programming 
Gaussian 

Linear  Programming 

Table  5:  Propagated  uncertainty  regions  for  circularly-symmetric  Gaussian  and  bounded  errors  in  the  image  points. 
Our  solution  for  the  uncertainty  region  either  is  analytic,  in  which  case  a  description  of  the  analytic  solution  is  given, 
or  is  numerical,  in  which  case  the  solution  can  be  found  by  Linear  Programming. 


model  cases  of  (scaled-orthographic,  Gaussian),  (scaled- 
orthographic,  bounded),  and  (perspective,  Gaussian). 
When  more  points  are  matched,  there  are  simple  and  ef¬ 
ficient  algorithms  for  computing  the  uncertainty  regions. 
Table  5  summarizes  the  results. 

Both  analytic  results  and  experiments  have  demon¬ 
strated  the  value  of  accurately  computing  the  uncer¬ 
tainty  regions.  The  uncertainty  region  of  a  point  can 
vary  greatly  depending  on  both  the  model  geometry  and 
pose.  Therefore,  any  naive  approach  that  uses  an  hy¬ 
pothesized  pose  to  match  additional  model  and  image 
points  is  likely  either  to  match  many  of  the  model  points 
to  image  points  they  could  not  have  produced,  or  to  miss 
many  image  points  they  could  have  produced. 

We  found  that  uncertainty  regions  based  on  randomly 
matching  only  three  points  tend  to  be  quite  large,  sup¬ 
porting  past  work  [21,  4]  that  they  will  lead  to  many 
false  matches.  We  also  observed,  however,  that  uncer¬ 
tainty  regions  shrink  dramatically  when  we  match  even 
one  more  point,  and  still  further  when  we  match  more. 
This  demonstrates  that  matching  larger  sets  of  points, 
while  being  careful  about  error,  can  produce  much  more 
accurate  recognition  systems. 

Finally,  we  extended  several  existing  approaches  to 
handling  error  in  recognition  systems,  which  were  previ¬ 
ously  restricted  to  domains  with  linear  projection  mod¬ 
els.  For  future  work,  we  are  looking  to  implement  the 
robust  recognition  systems  outlined  in  Sections  2  and  6. 


[oil  ai2f  =  T  where  T  = 

Then  from  Equations  10-13, 


T'l 


—  sin  4>  sin  9  cos  (f> 

—  cos  9  sin  9cos<f>  -  cos  9  sin  <l> 

_ 1 _ 

cos  ^(sin^  <t>  +  sin^  9  cos^  (f>) 


—  cos  ^  sin  ^  —  sin  9  cos 

cos  9  sin  9  cos  </>  —  sin  (f> 


I' cos^  9\  cos  9 

\  5or  /  sor(l  —  cos^  ^cos^  (^) 

—  cos  ^  sin  ^  —  sin  9  cos  ^ 

cos  9  sin  9  cos  (f>  —  sin  ^ 

Lastly  we  multiply  by 


’ 

sor’ 

—  sin((^  +  r') 

.  . 

cos^  9 

—  cos  9  sin  9  cos[(j>  +  r') 

to  get 


On 

Ol2 


/  sqt'  \  /  cos^ 

\cos2^/  \sor(l  —  cos^  ^cos^  (^) 

cos  0  sin  ^  —  sin  9  cos  <l> 

cos  9  sin  9  cos  <t>  —  sin 


Availability 

To  facilitate  the  use  of  the  results  in  this  paper,  we  have 
made  available  our  C  code  for  computing  the  3D  pose 
solution  implied  by  3  point  correspondences  under  weak 
perspective,  the  three  scaled  rotation  matrices  (Equa¬ 
tion  27),  and  the  uncertainty  circles  (Equation  28).  To 
retrieve  the  code,  ftp  to  “ftp.ai.mit.edu,”  then  log  in  as 
“anonymous,”  then  cd  to  “pub/users/tda/,”  and  then 
get  and  uncompress  “alignment-code. tar.Z.” 


—  sin((^  -f  r') 

—  cos  9  sin  9  cos((^  -h  r') 

=  ^ _ ) 

r  \  1  —  cos^  9  cos^  (j)  J 

cos  r'  —  cos^  9  cos  <l>  cos(<^  +  r') 

—  sin  T*  sin  9 

This  equation  gives  A,  since  a2i  =  —cli2  and  022  =  an. 


A  Scaled-Orthographic  Similarity 
Transform 

This  appendix  solves  the  following  equations  for  A. 

.A  and  , 

Let  te  =  (aJ0,2/«),  t<t>  =  tg  =  (x^,!^),  = 

(x^,  y^).  Expanding  the  first  row  of  each  equation  gives 
Xg  =  aiiXe-\-ai2X0  and  =  ana;^-|-ai2i/^,  which  implies 


B  Derivation  of  the  Perspective  Linear 
Transform 

Using  perspective  projection,  this  appendix  derives  a  lin¬ 
ear  transform  that  relates  the  errors  in  third  and  fourth 
points.  First  we  compute  the  3D  position  of  rn2  in  cam¬ 
era  coordinates,  as  given  in  Equation  34.  In  Fig.  11,  let 
the  normal  to  the  plane  through  (c,  iq,  ii)  be 

n  =  =  (0,n„n^)  (42) 


By  Rodriguez’  formula, 

=  (cos^)p4-(l-cos^)(n-^n  +  sin0(n  x  p) 

=  {dcos9  H-  rsintf(ny  sin^  —  riz  cos<j>)^ 
r  cos  (f>  cos  0  +  r(  1  —  cos  0){ny  cos  (j> 

-hn-i  sm<f>)ny  +  n^d sin 9^ 
r  sin  (j>  cos  0  +  r(l  —  cos  9)[ny  cos  <f> 

-hn^  sin  (f>)nz  —Uyd  sin  9)  (43) 

To  compute  the  translation  u,  let  a  and  b  be  the 
(known)  distances  from  c  to  io  and  from  c  to  ii,  re¬ 
spectively.  From  the  Law  of  Cosines, 

»"■  =  (^4!^)  ■  («> 

*  =  ‘“'‘C 

where  ^oij  V'  ^  (Oj  ^)-  From  the  Law  of  Sines, 

In  total,  we  have 

{x,y,z)  =  m2 

=  c  +  Iv +  rcosi^,  rsin^)(46) 

Substituting  x,  y,  and  z  into  Equation  36  gives  mj . 

Next  we  take  a  first  approximation  to  x  and  y  in  Equa¬ 
tion  36  with  respect  to  9  and  (f>.  Let  =  ff  >  3/^  = 
and  similarly  for  y^,  x^,  y^,  y^,  y^,  and  Le.  From 
Equation  36, 
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X0  Z0(x-Cx) 


)■ 


:  +  /  C^  +  fy  J’ 

^\z  +  f  {z  +  fY  ) 

For  x<^  and  y^,  we  substitute  <j>  for  9  in  these  equations. 
Using  Equations  43-46, 

xg  =  — dsin^ -j- r  cos^(ny  sin  ^  —  n,  cos  <^) -f  LflU* 

yQ  =  —  r  cos  (^sin^  +  r  sin0(ny  cos  ^  +  n,  sin  (^)ny 

H-n,  d  cos  d  +  LgVy 

zg  =  -r  sin  sin  0  +  r  sin  9{ny  cos  -f  n,  sin  (^)n^ 

— nydcosfl  -h  LgVz 

Le  =  +  ^  + 

rsin5(nyCos(^  +  n,  sin0) 
y^  =  —r  sin  0cos^  +  r(l  —  cos  ^) 

(— Tiy  sin  ^  +  n,  cos  ^)ny 
=  r  cos  (^cosd -h  r(l  —  costf) 

(— riy  sin  +  n,  cos  (^)nx 


The  above  equations  give  tg  and  By  substituting 

r'  for  r,  d'  for  d,  and  -h  r'  for  (f>,  we  get  and  . 
Solving  Equation  23  leads  to 


A  = 


y4>^g  -  yex'^ 


X^Xq  -f  x^x' 


xgy^,  -  x^yg  [  y4>}fg  -  ye/^  -^<i>yg  +  ^9y[  \ 


C  Gaussian  Error  Propagation 

For  this  appendix,  we  adopt  Therrian’s  notation  [42], 
In  general,  let  x  be  a  Gaussian  random  vector  and  let 
y  =  Mx  4-  6,  where  M  is  n  x  m.  For  random  vectors  x 
and  y,  respectively,  denote  their  expected  values  by  m* 
and  my  and  their  covariance  matrices  by  Kx  and  Ky. 
Then 


Py 


J _ g(-'5(y-^v)'^Ky  "(y-m^)) 


(2^)?iKy|" 


where  rhy  =  Mm*  +  h  and  Ky  =  MKxM"^  [42].  In 
our  case,  we  have  four  two-dimensional  Gaussian  dis¬ 
tributions,  corresponding  to  the  errors  in  three  matched 
image  points  and  one  unmatched  image  point.  This  gives 
eight  uncorrelated  Gaussian  random  variables,  of  which 
we  are  taking  a  linear  combination.  When  io,  ^2j  and 
13  are  normally  distributed  with  standard  deviations  o-q, 
cTi,  cr2,  and  (T3,  respectively,  Kx  is  a  diagonol  matrix 
with  on-diagonol  elements  ((Tq,  ctq,  aj,  (Tj,  ct^,  cr^,  o’!). 
Further,  the  linear  combination  in  Equation  27  is  given 
by 


M  = 


Oil  ai2  611  bi2  cii  C12  1  0 

<^21  ®22  ^21  ^22  ^21  C22  6  1 


,  (49) 


where  n  =  2  and  m  =  8.  Expanding  MKxM"^  leads  to 
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+  C?2 
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1  0 
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011021  4-  O12O22 
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&11^21  +  ^12^22 
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Under  weak-perspective  projection,  021  =  -012, 

022  =  ^21  =  “^12)  &22  =  fell)  C21  =  — C12,  and 

C22  =  cii.  Letting  Sp  .=  y/al^  +  ^»i2)  =  V^^ii  +  ^12j 

and  52  =  V^^ii  +  ^12 >  expression  for  Ky  simplifies 
to 


Ky  =  {SqCTq  4"  ^iCTi  4-  5202  4“  O3) 


1  0 
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(51) 


Then 
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where 
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(52) 
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